## 🌟 Exercise 1: Identifying Data Types

| Data Source                                               | Type          | Explanation                                                                 |
|-----------------------------------------------------------|---------------|------------------------------------------------------------------------------|
| A company’s financial reports stored in an Excel file     | Structured    | Organized into rows and columns, can be used in tables or spreadsheets.     |
| Photographs uploaded to a social media platform           | Unstructured  | No predefined structure, just image files.                                  |
| A collection of news articles on a website                | Unstructured  | Text-based, not stored in tabular form.                                     |
| Inventory data in a relational database                   | Structured    | Stored in tables with defined schemas.                                      |
| Recorded interviews from a market research study          | Unstructured  | Audio files without structure, require transcription to analyze.            |


## 🌟 Exercise 2: Transformation of Unstructured Data

| Unstructured Data                        | Transformation Method |
|------------------------------------------|------------------------|
| Blog posts about travel                  | Use NLP (Natural Language Processing) to extract places, dates, keywords, and create a structured dataset. |
| Audio recordings from customer service   | Apply speech-to-text (ASR), then structure transcripts into columns like topic, emotion, agent. |
| Handwritten brainstorming notes          | Use OCR to convert text from image → structure ideas into bullet points, categories, mind maps. |
| Video cooking tutorial                   | Extract subtitles + metadata, segment steps into columns: Step number, Action, Ingredients. |


Exercice 3

In [9]:
import pandas as pd 

df_train = pd.read_csv("train/train.csv")
print(df_train.head())


   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   

   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S  
3      0            113803  53.1000  C123        S  
4      0            373450   8.0500   NaN        S  


Exercice 4

In [10]:
import pandas as pd
df_iris = pd.read_csv("Iris_dataset/Iris_dataset.csv")
df_iris.head()


Unnamed: 0,5.1,3.5,1.4,0.2,Iris-setosa
0,4.9,3.0,1.4,0.2,Iris-setosa
1,4.7,3.2,1.3,0.2,Iris-setosa
2,4.6,3.1,1.5,0.2,Iris-setosa
3,5.0,3.6,1.4,0.2,Iris-setosa
4,5.4,3.9,1.7,0.4,Iris-setosa


Exercice 5

In [14]:
!pip install xlrd

Collecting xlrd
  Downloading xlrd-2.0.2-py2.py3-none-any.whl.metadata (3.5 kB)
Downloading xlrd-2.0.2-py2.py3-none-any.whl (96 kB)
Installing collected packages: xlrd
Successfully installed xlrd-2.0.2


In [12]:
!pip install openpyxl


Collecting openpyxl
  Downloading openpyxl-3.1.5-py2.py3-none-any.whl.metadata (2.5 kB)
Collecting et-xmlfile (from openpyxl)
  Downloading et_xmlfile-2.0.0-py3-none-any.whl.metadata (2.7 kB)
Downloading openpyxl-3.1.5-py2.py3-none-any.whl (250 kB)
Downloading et_xmlfile-2.0.0-py3-none-any.whl (18 kB)
Installing collected packages: et-xmlfile, openpyxl

   -------------------- ------------------- 1/2 [openpyxl]
   -------------------- ------------------- 1/2 [openpyxl]
   -------------------- ------------------- 1/2 [openpyxl]
   -------------------- ------------------- 1/2 [openpyxl]
   -------------------- ------------------- 1/2 [openpyxl]
   -------------------- ------------------- 1/2 [openpyxl]
   -------------------- ------------------- 1/2 [openpyxl]
   -------------------- ------------------- 1/2 [openpyxl]
   -------------------- ------------------- 1/2 [openpyxl]
   -------------------- ------------------- 1/2 [openpyxl]
   -------------------- ------------------- 1/2 [openp

In [13]:
import pandas as pd

df = pd.DataFrame({
    'Nom': ['Alice', 'Bob', 'Charlie'],
    'Âge': [25, 30, 35],
    'Ville': ['Paris', 'Lyon', 'Marseille']
})


df.to_excel("sample_data.xlsx", index=False)


df.to_json("sample_data.json", orient='records', indent=2)

Exercice 6

In [15]:
import pandas as pd

json_path = "posts/posts.json"


df_json = pd.read_json("posts/posts.json")


df_json.head()


Unnamed: 0,userId,id,title,body
0,1,1,sunt aut facere repellat provident occaecati e...,quia et suscipit\nsuscipit recusandae consequu...
1,1,2,qui est esse,est rerum tempore vitae\nsequi sint nihil repr...
2,1,3,ea molestias quasi exercitationem repellat qui...,et iusto sed quo iure\nvoluptatem occaecati om...
3,1,4,eum et est occaecati,ullam et saepe reiciendis voluptatem adipisci\...
4,1,5,nesciunt quas odio,repudiandae veniam quaerat sunt sed\nalias aut...


In [16]:
!git add .
!git commit -m "Exp 1: Pandas - DataFrame, CSV, Excel, JSON"
!git push



[main 5c915eb] Exp 1: Pandas - DataFrame, CSV, Excel, JSON
 9 files changed, 2100 insertions(+)
 create mode 100644 S2/J2/Exp.ipynb
 create mode 100644 S2/J2/Iris_dataset.zip
 create mode 100644 S2/J2/Iris_dataset/Iris_dataset.csv
 create mode 100644 S2/J2/posts.zip
 create mode 100644 S2/J2/posts/posts.json
 create mode 100644 S2/J2/sample_data.json
 create mode 100644 S2/J2/sample_data.xlsx
 create mode 100644 S2/J2/train.zip
 create mode 100644 S2/J2/train/train.csv


To https://github.com/ID18030104/AI-exercises-checker.git
   6a80477..5c915eb  main -> main
