# Pandas DataFrame Tutorial

This notebook covers essential operations and concepts for working with pandas DataFrames.

## Table of Contents
1. [Importing pandas and creating DataFrames](#1.-Importing-pandas-and-creating-DataFrames)
2. [Basic DataFrame operations](#2.-Basic-DataFrame-operations)
3. [Indexing and selection](#3.-Indexing-and-selection)
4. [Handling missing data](#4.-Handling-missing-data)
5. [Data manipulation](#5.-Data-manipulation)
6. [Grouping and aggregation](#6.-Grouping-and-aggregation)
7. [Merging and joining DataFrames](#7.-Merging-and-joining-DataFrames)
8. [Basic data visualization](#8.-Basic-data-visualization)

## 1. Importing pandas and creating DataFrames

In [None]:
import pandas as pd
import numpy as np

# Creating a DataFrame from a dictionary
data = {
    'Name': ['Rami', 'Osama', 'Nael', 'Yara'],
    'cash': [25, 30, 35, 28],
    'birthdate': ['1990-01-01', '1951-02-02', '1892-03-03', '1970-04-04']
}
df = pd.DataFrame(data)
print(df)

## 2. Basic DataFrame operations

In [None]:
# Display basic information about the DataFrame
df.info()

In [None]:
# Display summary statistics
df.describe()

In [None]:
# Display the first few rows
df.head(2)

In [None]:
# Display the last few rows
df.tail(2)

In [None]:
# Get column names
df.columns

In [None]:
# Get data types of columns
df.dtypes

In [None]:
# fix date


## 3. Indexing and selection

In [None]:
# Select a single column
df['Name']

In [None]:
# Select multiple columns
df[['Name', 'cash']]



In [None]:
# Select rows by index
df.loc[1]



In [None]:
# Select rows by condition
df[df['cash'] > 30]


In [None]:
# Select specific rows and columns
df.loc[2:3, ['Name', 'birthdate']]

## 4. Handling missing data

In [None]:
# Create a DataFrame with missing values
df_missing = pd.DataFrame({
    'A': [1, 2, np.nan, 4],
    'B': [5, np.nan, np.nan, 8],
    'C': [9, 10, 11, 12]
})

df_missing

In [None]:
# Check for missing values
df_missing.isnull()

In [None]:
# Drop rows with missing values
df_missing.dropna()

In [None]:
# Fill missing values
df_missing.fillna(0)

## 5. Data manipulation

In [None]:
# Add a new column
df['Country'] = ['JOR', 'JOR', 'JOR', 'JOR']
print(df)


In [None]:
# Rename columns
df = df.rename(columns={'Name': 'Full_Name'})
df


In [None]:
# Sort DataFrame
df.sort_values('birthdate', ascending=False)


In [None]:
# Apply a function to a column
df['More_Cash'] = df['cash'].apply(lambda x: x**2)
df

## 6. Grouping and aggregation

In [None]:
# Create a larger DataFrame for grouping
df_large = pd.DataFrame({
    'Category': ['red ball', 'blue ball', 'red ball', 'blue ball', 'red ball', 'blue ball'],
    'Value': [10, 21, 13, 34, 18, 27]
})
df_large

In [None]:
# Group by category and calculate sum
df_large.groupby('Category')['Value'].sum()


In [None]:
# Group by category and calculate multiple aggregations
df_large.groupby('Category').agg({
    'Value': ['mean', 'sum', 'count']
})

## 7. Merging and joining DataFrames

In [None]:
# Create two DataFrames to merge
df1 = pd.DataFrame({'ID': [1, 2, 3, 4], 'Name': ['Osama', 'Yara', 'Nael', 'Rami']})
df2 = pd.DataFrame({'ID': [2, 3, 4, 5], 'Salary': [50000, 60000, 70000, 80000]})

display(df1)
display(df2)

In [None]:
# Perform an inner join
pd.merge(df1, df2, on='ID')

In [None]:
# Perform a left join
pd.merge(df1, df2, on='ID', how='left')

In [None]:
# Concatenate DataFrames
pd.concat([df1, df2], axis=1)

## 8. Basic data visualization

In [None]:
df_large = pd.DataFrame({
    'Category': ['red ball', 'blue ball', 'red ball', 'blue ball', 'red ball', 'blue ball'],
    'Value': [10, 21, 13, 34, 18, 27]
})
df_large

In [None]:
import plotly.express as px

# Create a bar chart
px.bar(df_large, x='Category', y='Value')

In [None]:
# create pie chart
px.pie(df_large, names='Category', values='Value')

In [None]:
# Create a histogram
px.histogram(df_large, x='Category')