Homework: Data Exploration

The objective of this exercise is to perform data exploration using Pandas and to calculate various statistical measures for a real dataset. By the end of the exercise, you should be able to:
Load a dataset into a Pandas DataFrame
Explore the structure and format of the data
Calculate measures of central tendency, variability, and shape
Perform basic data cleaning and preprocessing
Dataset

For this exercise, we will be using the "Iris" dataset, which contains information about three species of iris flowers. The dataset is available in the "sklearn" library in Python.
To load the dataset, you will need to install the "sklearn" library and import the dataset using the following code:
------------------------------------------------------------------------------------------------
from sklearn.datasets import load_iris
data = load_iris()
df = pd.DataFrame(data["data"], columns=data["feature_names"])
df["target"] = data["target"]
df["target"] = df["target"].map({0: "setosa", 1: "versicolor", 2: "virginica"})
------------------------


Instructions
Load the "Iris" dataset into a Pandas DataFrame using the code provided above.

Display the first 5 rows of the DataFrame using the "head()" method.

Check the shape of the DataFrame using the "shape" attribute.

Check the data types of each column using the "dtypes" attribute.

Check for missing values in the DataFrame using the "isnull()" method and the "sum()" method.

Calculate the mean, median, and mode for the "sepal length (cm)" column.

Calculate the range, variance, and standard deviation for the "petal width (cm)" column.

Calculate the skewness and kurtosis for the "sepal width (cm)" column.

Count the number of unique values in the "target" column using the "nunique()" method.

Group the data by the "target" column and calculate the mean for each group using the "groupby()" method and the "mean()" method.

In [None]:
pip install scikit-learn




In [None]:
from sklearn.datasets import load_iris
import pandas as pd

# Load the Iris dataset
data = load_iris()

# Create a DataFrame
df = pd.DataFrame(data["data"], columns=data["feature_names"])

# Add the target column
df["target"] = data["target"]

# Map numerical target values to their respective species names
df["target"] = df["target"].map({0: "setosa", 1: "versicolor", 2: "virginica"})




In [None]:
# Display the first few rows of the DataFrame
print(df.head())

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   

   target  
0  setosa  
1  setosa  
2  setosa  
3  setosa  
4  setosa  


In [None]:
print("\nShape of DataFrame :")
print(df.shape)


Shape of DataFrame :
(150, 5)


In [None]:
print("\nTypes each column :")
print(df.dtypes)


Types each column :
sepal length (cm)    float64
sepal width (cm)     float64
petal length (cm)    float64
petal width (cm)     float64
target                object
dtype: object


In [None]:
print("\nMissing values DataFrame :")
print(df.isnull().sum())


Valeurs manquantes dans le DataFrame :
sepal length (cm)    0
sepal width (cm)     0
petal length (cm)    0
petal width (cm)     0
target               0
dtype: int64


In [None]:
print("\nStatistiques pour 'sepal length (cm)' :")
mean_sepal_length = df["sepal length (cm)"].mean()
median_sepal_length = df["sepal length (cm)"].median()
mode_sepal_length = df["sepal length (cm)"].mode()[0]

print(f"Moyenne : {mean_sepal_length}")
print(f"Médiane : {median_sepal_length}")
print(f"Mode : {mode_sepal_length}")


Statistiques pour 'sepal length (cm)' :
Moyenne : 5.843333333333334
Médiane : 5.8
Mode : 5.0


In [None]:
print("\nStatistiques pour 'petal width (cm)' :")
range_petal_width = df["petal width (cm)"].max() - df["petal width (cm)"].min()
variance_petal_width = df["petal width (cm)"].var()
std_dev_petal_width = df["petal width (cm)"].std()

print(f"Étendue : {range_petal_width}")
print(f"Variance : {variance_petal_width}")
print(f"Écart type : {std_dev_petal_width}")


Statistiques pour 'petal width (cm)' :
Étendue : 2.4
Variance : 0.5810062639821029
Écart type : 0.7622376689603465


In [None]:
print("\nAsymétrie et kurtosis pour 'sepal width (cm)' :")
skewness_sepal_width = df["sepal width (cm)"].skew()
kurtosis_sepal_width = df["sepal width (cm)"].kurt()

print(f"Asymétrie : {skewness_sepal_width}")
print(f"Kurtosis : {kurtosis_sepal_width}")


Asymétrie et kurtosis pour 'sepal width (cm)' :
Asymétrie : 0.31896566471359966
Kurtosis : 0.2282490424681929


In [None]:
print("\nNombre de valeurs uniques dans la colonne 'target' :")
unique_target_count = df["target"].nunique()
print(unique_target_count)


Nombre de valeurs uniques dans la colonne 'target' :
3


In [None]:
print("\nMoyenne des groupes par 'target' :")
grouped_means = df.groupby("target").mean()
print(grouped_means)


Moyenne des groupes par 'target' :
            sepal length (cm)  sepal width (cm)  petal length (cm)  \
target                                                               
setosa                  5.006             3.428              1.462   
versicolor              5.936             2.770              4.260   
virginica               6.588             2.974              5.552   

            petal width (cm)  
target                        
setosa                 0.246  
versicolor             1.326  
virginica              2.026  
