Exercise 1

1. **Financial reports in Excel** → Structured (rows/columns, fixed fields)
2. **Social media photographs** → Unstructured (image content)
3. **News articles on a website** → Unstructured (free-form text)
4. **Inventory data in relational DB** → Structured (tables with schema)
5. **Recorded interviews (audio)** → Unstructured (audio data)


Exercies 2

1) Blog posts about travel experiences → Structured trip records

**Method:**
1. Collect blog posts and metadata (title, author, date, URL).
2. Clean the text (remove HTML tags, normalize formatting).
3. Apply Natural Language Processing (NLP):
   - **Named Entity Recognition (NER):** extract places, dates, currencies, points of interest.
   - **Keyphrase extraction:** detect activities (e.g., kayaking, museum visits).
   - **Sentiment analysis:** measure overall tone.
   - **Regex parsing:** find and normalize costs and currencies.
4. Store results in structured tables (locations, activities, expenses, sentiment).

**Reasoning:** Blogs are narrative and unstructured; NLP enables consistent, analyzable fields like “where,” “when,” “what,” and “cost.”


2) Audio recordings of customer service calls → Call analytics table
Method:

Convert audio to text using Automatic Speech Recognition (ASR).

Apply speaker diarization to separate agent and customer dialogue.

Analyze transcript with NLP:

Classify intent and issue category.

Measure sentiment for each speaker.

Detect escalations and resolutions.

Store metadata: call duration, silence ratio, resolution outcome.

Reasoning: Audio content is unsearchable; converting to text allows automated classification, trend analysis, and performance monitoring.


3) Handwritten notes from a brainstorming session → Action items & themes
Method:

Scan or photograph the handwritten notes.

Use OCR/handwriting recognition to convert images to text.

Parse layout: detect bullet points, headings, and checkboxes.

Apply NLP clustering to group similar ideas.

Extract action items with assigned owners and due dates.

Reasoning: Converting messy, handwritten content into digital structured fields makes tracking and follow-up easier.


4) A video tutorial on cooking → Recipe dataset with steps & timeline
Method:

Extract audio from the video and transcribe using ASR.

Use on-screen OCR to capture ingredient lists from text overlays.

Apply shot/scene detection to segment steps visually.

Parse transcript with NLP to extract ingredients, steps, quantities, and tools.

Align each step with start/end timestamps and link to representative images.

Reasoning: Videos are rich but hard to search; structured recipes enable indexing, search, and analysis of cooking content.

Exercise 3

In [1]:
import pandas as pd

df = pd.read_csv('/content/drive/MyDrive/di_bootcamp_resources/week_7/train.csv')

# Display the first 5 rows
display(df.head())

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


Exercise 4

In [2]:
import pandas as pd

iris_df = pd.read_csv('/content/drive/MyDrive/di_bootcamp_resources/week_7/Iris Dataset/Iris.csv')

# Display the first 5 rows
display(iris_df.head())

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


Exercise 5

# Task
Create a simple dataframe and export it to an excel file and a JSON file.

## Create dataframe

### Subtask:
Generate a simple pandas DataFrame with some sample data.


**Reasoning**:
Create a dictionary and convert it to a pandas DataFrame, then display the head.



In [3]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

sample_df = pd.DataFrame(data)
display(sample_df.head())

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago
3,David,40,Houston


## Export to excel

### Subtask:
Export the created DataFrame to an Excel file.


**Reasoning**:
Export the `sample_df` DataFrame to an Excel file as instructed.



In [4]:
sample_df.to_excel('sample_data.xlsx', index=False)

## Export to json

### Subtask:
Export the created DataFrame to a JSON file.


**Reasoning**:
Export the created DataFrame to a JSON file named `sample_data.json` using the `.to_json()` method with the `orient` parameter set to `'records'`.



In [5]:
sample_df.to_json('sample_data.json', orient='records')

## Summary:

### Data Analysis Key Findings

*   A pandas DataFrame named `sample_df` was created with sample data for 'Name', 'Age', and 'City'.
*   The `sample_df` DataFrame was successfully exported to an Excel file named `sample_data.xlsx` without the index.
*   The `sample_df` DataFrame was successfully exported to a JSON file named `sample_data.json` in the 'records' format.

### Insights or Next Steps

*   The process demonstrates the basic functionality of creating a pandas DataFrame and exporting it to common file formats like Excel and JSON.


Exercise 6

In [6]:
import pandas as pd

try:
  posts_df = pd.read_json('/content/drive/MyDrive/di_bootcamp_resources/week_7/posts.json')
  display(posts_df.head())
except FileNotFoundError:
  print("Error: The file was not found. Please ensure the file path is correct and you have access.")
except Exception as e:
  print(f"An error occurred: {e}")

Unnamed: 0,userId,id,title,body
0,1,1,sunt aut facere repellat provident occaecati e...,quia et suscipit\nsuscipit recusandae consequu...
1,1,2,qui est esse,est rerum tempore vitae\nsequi sint nihil repr...
2,1,3,ea molestias quasi exercitationem repellat qui...,et iusto sed quo iure\nvoluptatem occaecati om...
3,1,4,eum et est occaecati,ullam et saepe reiciendis voluptatem adipisci\...
4,1,5,nesciunt quas odio,repudiandae veniam quaerat sunt sed\nalias aut...
