# Task
Unzip the file from `/content/drive/MyDrive/Data set/Données sur les facteurs de performance des élèves.zip`, list the extracted files, then load the first identified data file into a pandas DataFrame, display the first 5 rows along with its columns and data types, and finally, summarize the unzipping process and initial data loading.

## Unzip File

### Subtask:
Unzip the provided file from '/content/drive/MyDrive/Data set/Données sur les facteurs de performance des élèves.zip' to a temporary directory. This step will extract all the contents of the zip archive.


**Reasoning**:
The subtask requires unzipping a file. This code block will import the necessary 'os' module, define the file and extraction paths, create the extraction directory if it doesn't exist, and then use a shell command to unzip the file.



In [4]:
import os

zip_file_path = '/content/drive/MyDrive/Data set/Données sur les facteurs de performance des élèves.zip'
extraction_directory = '/tmp/extracted_data'

# Create the extraction directory if it does not exist
os.makedirs(extraction_directory, exist_ok=True)

# Unzip the file
!unzip "{zip_file_path}" -d "{extraction_directory}"

Archive:  /content/drive/MyDrive/Data set/Données sur les facteurs de performance des élèves.zip
replace /tmp/extracted_data/StudentPerformanceFactors.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

## List Extracted Files

### Subtask:
List all files and directories extracted from the zip archive to understand the structure and identify the data files within.


**Reasoning**:
To fulfill the subtask of listing all extracted files, I will use `os.listdir()` on the `extraction_directory` and print the result.



In [2]:
extracted_files = os.listdir(extraction_directory)
print("Extracted files and directories:")
for item in extracted_files:
    print(f"- {item}")

Extracted files and directories:
- StudentPerformanceFactors.csv


## Load and Display Data

### Subtask:
Assuming the zip file contains a structured data file (e.g., CSV), load the first identified data file into a pandas DataFrame and display the first 5 rows to get a quick overview of its content. This step will also display the columns and their data types.


**Reasoning**:
To load the data, I need to import the pandas library, construct the full file path, read the CSV into a DataFrame, and then display the first 5 rows and its information as requested in the subtask.



In [3]:
import pandas as pd
import os

# Construct the full path to the data file
data_file_path = os.path.join(extraction_directory, extracted_files[0])

# Load the CSV file into a pandas DataFrame
df = pd.read_csv(data_file_path)

print("First 5 rows of the DataFrame:")
print(df.head())

print("\nDataFrame Info (columns and data types):")
df.info()

First 5 rows of the DataFrame:
   Hours_Studied  Attendance Parental_Involvement Access_to_Resources  \
0             23          84                  Low                High   
1             19          64                  Low              Medium   
2             24          98               Medium              Medium   
3             29          89                  Low              Medium   
4             19          92               Medium              Medium   

  Extracurricular_Activities  Sleep_Hours  Previous_Scores Motivation_Level  \
0                         No            7               73              Low   
1                         No            8               59              Low   
2                        Yes            7               91           Medium   
3                        Yes            8               98           Medium   
4                        Yes            6               65           Medium   

  Internet_Access  Tutoring_Sessions Family_Income Teac

## Final Task

### Subtask:
Provide a summary of the unzipping process and the initial data loading, confirming the files extracted and the first few rows of the data.


## Summary:

### Data Analysis Key Findings
*   The zip file `/content/drive/MyDrive/Data set/Données sur les facteurs de performance des élèves.zip` was successfully unzipped, and a single file named `StudentPerformanceFactors.csv` was extracted to `/tmp/extracted_data`.
*   The `StudentPerformanceFactors.csv` file was loaded into a pandas DataFrame, which contains 6607 entries and 20 columns.
*   The DataFrame consists of 7 integer (`int64`) columns and 13 object (`object`) columns.
*   Initial inspection revealed that several columns, including `Teacher_Quality`, `Parental_Education_Level`, and `Distance_from_Home`, have missing values (non-null counts less than 6607).

### Insights or Next Steps
*   A thorough data cleaning process, particularly addressing the identified missing values, will be essential before further analysis.
*   Further exploratory data analysis (EDA) should be conducted to understand the distribution of variables, identify potential outliers, and examine relationships between columns, especially focusing on the 'object' type columns for potential encoding needs.
