<a href="https://colab.research.google.com/github/darvesh-sd/Copy-of-TPSession1.ipynb/blob/main/Copy_of_TPSession1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Manipulation des données avec Python Panda - Darvesh

This segment imports the necessary libraries. pandas is used for data manipulation, matplotlib and seaborn for visualization, and files allows you to upload your Excel file.

In [21]:
# Step 1: Import necessary libraries
import pandas as pd              # For data manipulation and analysis
import matplotlib.pyplot as plt   # For data visualization
import seaborn as sns            # For advanced data visualization
from google.colab import files    # For uploading files in Google Colab

# Set visualization style for seaborn
sns.set(style='whitegrid')       # Use a white grid background for plots


This code prompts the user to upload the Excel file containing car brands data. The uploaded file is stored in a dictionary.

This segment reads the uploaded Excel file into a DataFrame using pandas. The engine='openpyxl' specifies that the file is in .xlsx format. The first few rows of the DataFrame are displayed to give a preview of the data.

In [22]:
# Step 2: Upload the Excel file
print("Please upload the Excel file containing car brands data:")
uploaded = files.upload()  # Opens a file dialog for uploading the file



Please upload the Excel file containing car brands data:


Saving car_brands.xlsx to car_brands (1).xlsx


For CSV files.

In [None]:
# Step 3: Read the uploaded CSV file into a DataFrame
df = pd.read_csv(list(uploaded.keys())[0])  # Read the first uploaded file

# Display the first few rows of the DataFrame
print("Data Preview:")
print(df.head())  # Show the first 5 rows of the DataFrame


In [None]:
# Step 3: Read the uploaded Excel file into a DataFrame
# Specify the engine as 'openpyxl' for .xlsx files
df = pd.read_excel(list(uploaded.keys())[0], engine='openpyxl')  # Read the first uploaded file

# Display the first few rows of the DataFrame
print("Data Preview:")
print(df.head())  # Show the first 5 rows of the DataFrame



This segment provides general information about the DataFrame, including data types and the number of non-null entries. It also shows descriptive statistics such as mean, median, and standard deviation for numerical columns.

In [None]:
# Step 4: Display general information about the DataFrame
print("\nData Information:")
print(df.info())  # Provides details like column names, data types, and non-null counts

# Step 5: Display descriptive statistics for numerical columns
print("\nDescriptive Statistics:")
print(df.describe())  # Shows summary statistics for numerical columns



This segment creates a bar plot to visualize the number of cars for each fuel type. The figure size is set, and labels are added for clarity.

In [None]:
# Step 6: Visualize the count of cars by fuel type
plt.figure(figsize=(10, 6))  # Set the figure size
sns.countplot(data=df, x='Fuel', palette='viridis')  # Create a count plot for fuel types
plt.title('Count of Cars by Fuel Type', fontsize=16)  # Set the title
plt.xlabel('Fuel Type', fontsize=14)  # Label x-axis
plt.ylabel('Number of Cars', fontsize=14)  # Label y-axis
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
plt.show()  # Display the plot



This segment checks for and counts missing values in the DataFrame. It then visualizes missing data using a heatmap for a clear overview of where data might be missing.

In [None]:
# Step 7: Check for missing values
missing_values = df.isnull().sum()  # Count missing values in each column
print("\nMissing Values:")
print(missing_values)  # Display the count of missing values

# Step 8: Visualize missing values
plt.figure(figsize=(10, 6))  # Set the figure size
sns.heatmap(df.isnull(), cbar=False, cmap='viridis')  # Create a heatmap of missing values
plt.title('Missing Values Heatmap', fontsize=16)  # Set the title
plt.xlabel('Columns', fontsize=14)  # Label x-axis
plt.ylabel('Rows', fontsize=14)  # Label y-axis
plt.show()  # Display the heatmap


This segment retrieves and displays unique colors from the 'Color' column. It also visualizes the distribution of car prices using a histogram with a kernel density estimate (KDE) overlay.



In [None]:
# Step 9: Display unique values in the 'Color' column
if 'Color' in df.columns:
    unique_colors = df['Color'].unique()  # Get unique values in the 'Color' column
    print("\nUnique Colors Available in the Dataset:")
    print(unique_colors)  # Display unique colors
else:
    print("\n'Color' column not found.")  # Handle case where 'Color' column does not exist

# Step 10: Visualizing the price distribution
plt.figure(figsize=(10, 6))  # Set the figure size
sns.histplot(df['Price'], bins=10, kde=True, color='blue')  # Create a histogram with density plot
plt.title('Price Distribution of Cars', fontsize=16)  # Set the title
plt.xlabel('Price', fontsize=14)  # Label x-axis
plt.ylabel('Frequency', fontsize=14)  # Label y-axis
plt.show()  # Display the histogram



This final segment summarizes key insights about the dataset, including the total number of cars, average price, most common fuel type, and most common color.

In [None]:
# Step 11: Summary of key insights
print("\nSummary of Insights:")
print(f"Total number of cars in the dataset: {df.shape[0]}")  # Total number of cars
print(f"Average price of cars: {df['Price'].mean():.2f}")  # Average price
print(f"Most common fuel type: {df['Fuel'].mode()[0]}")  # Most common fuel type
print(f"Most common color: {df['Color'].mode()[0]}")  # Most common color
