# 🌟 Exercise 1: Identifying Data Types  
Below are various data sources. Identify whether each one is an example of structured or unstructured data.  

A company’s financial reports stored in an Excel file. -- **STRUCTURED**  
Photographs uploaded to a social media platform. -- **UNSTRUCTURED**  
A collection of news articles on a website. -- **UNSTRUCTURED**  
Inventory data in a relational database. -- **STRUCTURED**  
Recorded interviews from a market research study. -- **UNSTRUCTURED**  

# 🌟 Exercise 2: Transformation Exercise  
For each of the following unstructured data sources, propose a method to convert it into structured data. Explain your reasoning.  

### 1. A series of blog posts about travel experiences.
- **Method**: Convert the blog posts into structured data by extracting key elements like title, author, date, location mentioned, and main topics. Store this information in a table with columns corresponding to each element.
- **Reasoning**: This approach allows for easy searching, filtering, and analysis of content across different blog posts.

### 2. Audio recordings of customer service calls.
- **Method**: Use speech-to-text software to transcribe the audio recordings, then extract key information such as customer ID, issue type, resolution status, and call duration. Store this in a structured format like a table.
- **Reasoning**: Transcription allows the conversion of spoken words into text, making it easier to analyze and categorize the data.

### 3. Handwritten notes from a brainstorming session.
- **Method**: Use optical character recognition (OCR) to digitize the handwritten notes, then organize the content into structured categories like ideas, actions, and responsible persons.
- **Reasoning**: OCR helps to convert handwritten content into text, which can then be structured for easier analysis and follow-up.

### 4. A video tutorial on cooking.
- **Method**: Break down the video into steps, extract key information like ingredients, quantities, and cooking times, and store this in a structured format such as a recipe database.
- **Reasoning**: Structuring the content allows users to easily search for specific recipes, steps, or ingredients, and follow the instructions more effectively.


# Exercise 3
### 1. Transaction records
- **Category**: Structured
- **Use**: Analyze sales trends, customer purchasing behavior, and inventory management to optimize stock levels and pricing strategies.

### 2. Customer feedback comments
- **Category**: Unstructured
- **Use**: Perform sentiment analysis to understand customer satisfaction and identify areas for product or service improvement.

### 3. Social media posts about your brand
- **Category**: Unstructured
- **Use**: Monitor brand sentiment, track marketing campaign effectiveness, and engage with customers to enhance brand loyalty.

### 4. Employee work schedules
- **Category**: Structured
- **Use**: Optimize staffing levels to ensure adequate coverage during peak hours and improve employee productivity and satisfaction.


# Exercise 4

In [1]:
#!pip install kaggle

Collecting kaggle
  Downloading kaggle-1.6.17.tar.gz (82 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting certifi>=2023.7.22 (from kaggle)
  Using cached certifi-2024.7.4-py3-none-any.whl.metadata (2.2 kB)
Collecting requests (from kaggle)
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting tqdm (from kaggle)
  Downloading tqdm-4.66.5-py3-none-any.whl.metadata (57 kB)
Collecting python-slugify (from kaggle)
  Downloading python_slugify-8.0.4-py2.py3-none-any.whl.metadata (8.5 kB)
Collecting urllib3 (from kaggle)
  Using cached urllib3-2.2.2-py3-none-any.whl.metadata (6.4 kB)
Collecting bleach (from kaggle)
  Downloading bleach-6.1.0-py3-none-any.whl.metadata (30 kB)
Collecting webencodings (from bleach->kaggle)
  Downloading webencodings-0.5.1-py2.py3-none-any.whl.metadata (2.1 kB)
Collecting text-unidecode>=1.3 (from python-slugify->kaggle)
  Downloading text_unidecode-1.3-py2.py3-none-any.whl

In [2]:
!kaggle competitions download -c titanic

Downloading titanic.zip to c:\Users\d1411\Документы\Python Projects\DI\Week4\Day3\ExercisesXP




  0%|          | 0.00/34.1k [00:00<?, ?B/s]
100%|██████████| 34.1k/34.1k [00:00<00:00, 177kB/s]
100%|██████████| 34.1k/34.1k [00:00<00:00, 177kB/s]


In [4]:
import zipfile
import pandas as pd

# Open the ZIP file
with zipfile.ZipFile('titanic.zip') as z:
    # List all files in the ZIP
    print(z.namelist())
    
    # Load the training data
    with z.open('train.csv') as f:
        train_df = pd.read_csv(f)
    
    # Load the test data
    with z.open('test.csv') as f:
        test_df = pd.read_csv(f)


['gender_submission.csv', 'test.csv', 'train.csv']


In [5]:
train_df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [6]:
test_df.head()

Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0,,S
2,894,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
4,896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S


# Exercise 5

In [8]:
import seaborn as sns

iris_data = sns.load_dataset('iris')
iris_data.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


# Exercise 6

In [14]:
#!pip install faker
#!pip install openpyxl

Collecting openpyxl
  Downloading openpyxl-3.1.5-py2.py3-none-any.whl.metadata (2.5 kB)
Collecting et-xmlfile (from openpyxl)
  Downloading et_xmlfile-1.1.0-py3-none-any.whl.metadata (1.8 kB)
Downloading openpyxl-3.1.5-py2.py3-none-any.whl (250 kB)
Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.1.5


In [12]:
import pandas as pd
from faker import Faker

# Initialize Faker
fake = Faker()

# Create a DataFrame with fake data
data = {
    'Name': [fake.name() for _ in range(10)],
    'Address': [fake.address() for _ in range(10)],
    'Email': [fake.email() for _ in range(10)],
    'Phone Number': [fake.phone_number() for _ in range(10)]
}

df = pd.DataFrame(data)

df.head()


Unnamed: 0,Name,Address,Email,Phone Number
0,Kevin Taylor,"953 Dominguez Brooks\nMurrayview, IN 41268",mckenzieelaine@example.org,+1-432-942-8864x885
1,Tracy Jones,USNS James\nFPO AA 25676,chasegreen@example.com,782-505-4865
2,Allen May,"43141 Hunter Ferry\nSouth Kathy, SD 26609",brooksdebbie@example.org,9797593798
3,William Caldwell,09260 Keith Expressway Apt. 091\nNorth Paigebu...,fisherstephanie@example.net,+1-577-773-1169x22896
4,Adam Ruiz,"PSC 8313, Box 2098\nAPO AP 31969",melissamacias@example.com,+1-222-297-7956x66909


In [15]:

df.to_excel('fake_data.xlsx', index=False)
df.to_json('fake_data.json', orient='records', lines=True)

# Exercise 7

In [16]:
import pandas as pd


url = "https://jsonplaceholder.typicode.com/posts"


df = pd.read_json(url)


df.head()


Unnamed: 0,userId,id,title,body
0,1,1,sunt aut facere repellat provident occaecati e...,quia et suscipit\nsuscipit recusandae consequu...
1,1,2,qui est esse,est rerum tempore vitae\nsequi sint nihil repr...
2,1,3,ea molestias quasi exercitationem repellat qui...,et iusto sed quo iure\nvoluptatem occaecati om...
3,1,4,eum et est occaecati,ullam et saepe reiciendis voluptatem adipisci\...
4,1,5,nesciunt quas odio,repudiandae veniam quaerat sunt sed\nalias aut...
