<a href="https://colab.research.google.com/github/Hani-Bin-Faisal-Ahammad/Python/blob/main/Pandas_Basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Importing Pandas

In [1]:
import pandas as pd

# 2. Creating a DataFrame
From a Dictionary:

In [2]:
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'Los Angeles', 'Chicago']}
df = pd.DataFrame(data)

From a CSV File:

In [4]:
df = pd.read_csv('/content/sample_data/california_housing_test.csv')

From a List of Lists:

In [5]:
data = [[1, 'Alice'], [2, 'Bob'], [3, 'Charlie']]
df = pd.DataFrame(data, columns=['ID', 'Name'])

# 3. Exploring DataFrames
View the first few rows:

In [6]:
df.head()

Unnamed: 0,ID,Name
0,1,Alice
1,2,Bob
2,3,Charlie


# View the last few rows:

In [7]:
df.tail()

Unnamed: 0,ID,Name
0,1,Alice
1,2,Bob
2,3,Charlie


Summary statistics:

In [8]:
df.describe()

Unnamed: 0,ID
count,3.0
mean,2.0
std,1.0
min,1.0
25%,1.5
50%,2.0
75%,2.5
max,3.0


Check column names:

In [9]:
df.columns

Index(['ID', 'Name'], dtype='object')

# 4. Selecting Columns and Rows
Select a specific column:

In [10]:
df['Name']

Unnamed: 0,Name
0,Alice
1,Bob
2,Charlie


Select multiple columns:

In [12]:
df[['Name', 'ID']]

Unnamed: 0,Name,ID
0,Alice,1
1,Bob,2
2,Charlie,3


Select rows by index position (using .iloc):

In [20]:
df.iloc[0]  # First row

Unnamed: 0,0
ID,1
Name,Alice


In [21]:
df.iloc[0:2]  # Rows 0 and 1

Unnamed: 0,ID,Name
0,1,Alice
1,2,Bob


Select rows by label (using .loc):

In [18]:
df.loc[0]  # First row by label

Unnamed: 0,0
ID,1
Name,Alice


In [17]:
df.loc[0:2]  # Rows 0 to 2 by label

Unnamed: 0,ID,Name
0,1,Alice
1,2,Bob
2,3,Charlie


# 5. Filtering Data
Filter by column values:

In [None]:
df[df['Age'] > 30]  # Filter rows where Age is greater than 30

Multiple conditions:

In [None]:
df[(df['Age'] > 30) & (df['City'] == 'Chicago')]

# 6. Adding/Modifying Columns
Add a new column:

In [24]:
df['Country'] = 'USA'  # Adds a new column 'Country' with the same value for all rows

Modify an existing column:

In [None]:
df['Age'] = df['Age'] + 1  # Increment each value in 'Age' column by 1

# 7. Handling Missing Data
Check for missing values:

In [25]:
df.isnull().sum()  # Count of missing values in each column

Unnamed: 0,0
ID,0
Name,0
Country,0


Drop rows with missing values:

In [None]:
df.dropna()  # Drops any rows with NaN values

Fill missing values:

In [None]:
df.fillna(0)  # Replace NaN with 0

# 8. Sorting Data
Sort by a single column:

In [None]:
df.sort_values(by='Age', ascending=False)  # Sort by Age in descending order

Sort by multiple columns:

In [None]:
df.sort_values(by=['City', 'Age'], ascending=[True, False])  # Sort by City (ascending), then by Age (descending)

# 9. Grouping Data
Group by a column and calculate aggregate statistics:

In [None]:
df.groupby('City').mean()  # Calculate mean for each city

Multiple aggregate functions:



In [None]:
df.groupby('City').agg({'Age': ['mean', 'sum'], 'Name': 'count'})

# 10. Merging and Joining DataFrames
Merge two DataFrames (like SQL JOIN):

In [26]:
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [1, 2], 'Age': [25, 30]})
merged_df = pd.merge(df1, df2, on='ID', how='inner')  # Merge by 'ID'

# 11. Reshaping Data
Pivot table:

In [None]:
pivot_df = df.pivot_table(values='Age', index='City', columns='Name', aggfunc='mean')

Melt DataFrame:

In [None]:
melted_df = pd.melt(df, id_vars=['City'], value_vars=['Age', 'Country'])

#12. Saving Data to a File
Save DataFrame to CSV:

In [None]:
df.to_csv('output.csv', index=False)  # Write DataFrame to CSV, without the index

Save DataFrame to Excel:

In [None]:
df.to_excel('output.xlsx', index=False)