<a href="https://colab.research.google.com/github/AI-Junction/Learn-ML/blob/LearnPandas/LearnPandasPart3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Provide a comprehensive set of pandas learning exercises covering data loading, inspection, selection, filtering, sorting, handling missing values, aggregation, and visualization, culminating in a small project applying these concepts.

## Introduction to pandas

### Subtask:
Start with the basics of pandas, such as creating DataFrames and Series.


**Reasoning**:
The first step is to import the pandas library and then create a Series and a DataFrame as instructed.



In [1]:
import pandas as pd

# Create a pandas Series from a Python list
s1 = pd.Series([1, 2, 3, 4, 5])

# Create a pandas DataFrame from a Python dictionary
data = {'col1': [10, 20, 30, 40], 'col2': [50, 60, 70, 80]}
df1 = pd.DataFrame(data)

# Print the created Series and DataFrame
print("Series s1:")
print(s1)
print("\nDataFrame df1:")
print(df1)

Series s1:
0    1
1    2
2    3
3    4
4    5
dtype: int64

DataFrame df1:
   col1  col2
0    10    50
1    20    60
2    30    70
3    40    80


## Data loading

### Subtask:
Learn how to load data from different file formats into a pandas DataFrame.


**Reasoning**:
Create a sample CSV file and then load it into a pandas DataFrame.



In [3]:
# Create a sample CSV file
with open('sample_data.csv', 'w') as f:
    f.write('Name,Age,City\n')
    f.write('Alice,25,New York\n')
    f.write('Bob,30,London\n')
    f.write('Charlie,35,Paris\n')

# Load the data from the CSV file into a pandas DataFrame
df_csv = pd.read_csv('sample_data.csv')

# Print the first few rows of the DataFrame
df_csv.head()

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,London
2,Charlie,35,Paris


## Data inspection

### Subtask:
Practice inspecting the data, including checking data types, missing values, and descriptive statistics.


**Reasoning**:
Display the data types, check for missing values, and generate descriptive statistics for the `df_csv` DataFrame as requested in the subtask instructions.



In [5]:
# Display data types
print("Data Types:")
df_csv.info()

# Check for missing values
print("\nMissing Values:")
print(df_csv.isnull().sum())

# Generate descriptive statistics for numerical columns
print("\nDescriptive Statistics:")
print(df_csv.describe())

Data Types:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3 non-null      object
 1   Age     3 non-null      int64 
 2   City    3 non-null      object
dtypes: int64(1), object(2)
memory usage: 204.0+ bytes

Missing Values:
Name    0
Age     0
City    0
dtype: int64

Descriptive Statistics:
        Age
count   3.0
mean   30.0
std     5.0
min    25.0
25%    27.5
50%    30.0
75%    32.5
max    35.0


## Data selection

### Subtask:
Learn how to select specific rows and columns from a DataFrame using various methods.


**Reasoning**:
Perform the requested selections and prints using single bracket notation, double bracket notation, .iloc, and .loc.



In [7]:
# Select and print the 'Name' column using single bracket notation
print("Selecting 'Name' column:")
print(df_csv['Name'])

# Select and print the 'Name' and 'Age' columns using double bracket notation
print("\nSelecting 'Name' and 'Age' columns:")
print(df_csv[['Name', 'Age']])

# Select and print the first row using .iloc
print("\nSelecting the first row using .iloc:")
print(df_csv.iloc[0])

# Select and print the 'City' for the person in the second row using .loc
print("\nSelecting the 'City' for the second row using .loc:")
print(df_csv.loc[1, 'City'])

Selecting 'Name' column:
0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object

Selecting 'Name' and 'Age' columns:
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35

Selecting the first row using .iloc:
Name       Alice
Age           25
City    New York
Name: 0, dtype: object

Selecting the 'City' for the second row using .loc:
London


## Data filtering

### Subtask:
Practice filtering data based on conditions.


**Reasoning**:
Filter the DataFrame based on the 'Age' and 'City' columns as requested by the instructions and print the results.



In [8]:
# Filter DataFrame where 'Age' is greater than 28
filtered_age_df = df_csv[df_csv['Age'] > 28]
print("DataFrame filtered by Age > 28:")
display(filtered_age_df)

# Filter DataFrame where 'City' is 'New York'
filtered_city_df = df_csv[df_csv['City'] == 'New York']
print("\nDataFrame filtered by City == 'New York':")
display(filtered_city_df)

DataFrame filtered by Age > 28:


Unnamed: 0,Name,Age,City
1,Bob,30,London
2,Charlie,35,Paris



DataFrame filtered by City == 'New York':


Unnamed: 0,Name,Age,City
0,Alice,25,New York
