### What is Pandas?
Pandas is a Python library used for data manipulation and analysis.

It provides two main data structures:

- Series → One-dimensional data (like a single column in Excel or a single list)

- DataFrame → Two-dimensional data (like a table or spreadsheet)

In [1]:
import pandas as pd
print(pd.__version__)

2.3.0


In [2]:
## Introduction to Pandas

s = pd.Series([10, 20, 30, 40])
print(s)

print("-" * 50)

df = pd.DataFrame({
    'Name': ['Islam', 'Ahmed', 'Saad'],
    'Age': [20, 30, 40],
    'Job': ['Teacher', 'Programmer', 'Designer']
})
print(df)

0    10
1    20
2    30
3    40
dtype: int64
--------------------------------------------------
    Name  Age         Job
0  Islam   20     Teacher
1  Ahmed   30  Programmer
2   Saad   40    Designer


In [3]:
## Creating Objects

data = [100, 200, 300]
s = pd.Series(data)
# print(s)  # The index is created automatically: 0, 1, 2
# ----------------------------------------------------------
s1 = pd.Series(data, index=['a', 'b', 'c'])
# print(s1)  # Now the labels are a, b, c.
# ----------------------------------------------------------
s2 = pd.Series({'a': 100, 'b': 200, 'c': 300})
# print(s2)  # same like s1, the keys are the labels
# ----------------------------------------------------------
data = {
    'Product': ['Book', 'Pen', 'Notebook'],
    'Price': [12.5, 1.5, 3.0]
}
df = pd.DataFrame(data)
# print(df)  # index will be 0, 1, 2 .. and Product and Price are the columns
# ----------------------------------------------------------
data = [
    {'Product': 'Book', 'Price': 12.5},
    {'Product': 'Pen', 'Price': 1.5},
    {'Product': 'Notebook', 'Price': 3.0}
]
df = pd.DataFrame(data)
# print(df)   # index will be 0, 1, 2 .. and Product and Price are the columns
# ----------------------------------------------------------
data = [
    ['Book', 12.5],
    ['Pen', 1.5],
    ['Notebook', 3.0]
]
df = pd.DataFrame(data, columns=['Product', 'Price'])
# print(df)  # index will be 0, 1, 2 .. and Product and Price are the columns




In [4]:
## Reading & Writing Data

df = pd.read_csv('data/sample.csv')
# print(df)
# ----------------------------------------------------------
# Writing to a new CSV file (without index)
df.to_csv('data/new_sample.csv', index=False)
# ----------------------------------------------------------
data = [
    ['Book', 12.5],
    ['Pen', 1.5],
    ['Notebook', 3.0],
    ['Notebook', 3.0],
    ['Notebook', 3.0],
    ['Notebook', 3.0],
]
df = pd.DataFrame(data, columns=['Product', 'Price'])
df.to_json('data/sample.json')
# ----------------------------------------------------------
df_json = pd.read_json('data/sample.json')
# print(df_json)
# ----------------------------------------------------------
"""
Pandas uses third-party libraries to handle Excel files (.xlsx), and by default, openpyxl is not installed.
install it using this command "pip3 install openpyxl"
"""
df.to_excel('data/sample.xlsx', index=False)  # remove index
df_excel = pd.read_excel('data/sample.xlsx')
# print(df_excel)

In [5]:
## Basic Data Exploration

# print(df.head())  # First 5 rows (default)
# print(df.head(3))  # first 3 rows
# print(df.tail())  # Last 5 (default)
# print(df.tail(3))  # Last 3
# ----------------------------------------------------------
# print(df.shape)  # returns (number of rows, number of columns)
# print(list(df.columns))  # List of column names
# print(df.index) # Row index (usually 0,1,2,...)
# print(df.info())  # shows (columns names, non-null counts, Data types)
# print(df.describe()) # shows (Count, mean, std, min, max, percentiles (for numeric columns).)
print(df.dtypes) # Helps you check if columns are numbers, strings, dates, etc.



Product     object
Price      float64
dtype: object


In [6]:
## Accessing Data

# accessing columns
df['Product']  # single column "series" 
df[['Product', 'Price']] # Multiple columns (as DataFrame), Double brackets
# ----------------------------------------------------------
# accessing rows
df.loc[0]  # access by label (df.loc['a']) /index
df.iloc[0]  # access by position, iloc means integer location
# ----------------------------------------------------------
# Accessing Rows + Columns Together
df.loc[0, :]  # Row 0, all columns
df.loc[0, 'Product']  # Row 0, specific column 'Product'
df.iloc[0, 0]  # First row, first column
# ----------------------------------------------------------
# Slicing Rows
df.iloc[:3]  # first 3 rows, 3 is not included like regular python list slicing
df.loc[df['Price'] < 2]  # Rows where Price < 2
df[df['Price'] < 2]  # the same like last one, without (.loc)


Unnamed: 0,Product,Price
1,Pen,1.5


### Summary:

- loc[] → index name + column name
- iloc[] → row number + column number

- Accessing Series

```python
df['Product']       # Series
df.loc[:, 'Product'] # Same as above → Series
```

- Accessing Data Frame
```python 
df[['Product', 'Price']]  # DataFrame
df.loc[:, ['Product', 'Price']]  # Same result → DataFrame


| Method      | Works On       | Uses Labels or Positions? |
| ----------- | -------------- | ------------------------- |
| `df['col']` | Columns        | Label (name)              |
| `df.loc[]`  | Rows & Columns | Label (index or name)     |
| `df.iloc[]` | Rows & Columns | Position (numbers)        |



In [None]:
## Basic Filtering & Sorting

# Boolean indexing:
# df['Price'] > 2 → gives True/False for each row
# Passing that into df[] filters the DataFrame.
df[df['Price'] < 2]  # same like the above
df.loc[df['Price'] < 2]

# Multiple conditions
df[(df['Price'] > 2) & (df['Product'] == 'Book')]  
df.loc[(df['Price'] > 2) & (df['Product'] == 'Book')]   # same like the above

# .isin(): Checks if column value is in the provided list.
df[df['Product'].isin(['Pen', 'Book'])]

# Sort by Column Values
df.sort_values(by='Product')  # sort by asc (default)
df.sort_values(by='Product', ascending=False)  # sort by desc
df.sort_values(by=['Product', 'Price'], ascending=[False, True])  # sort by multiple columns

# Sort by index
df.sort_index()  # sort by index asc
df.sort_index(ascending=False)  # sort by index desc

Unnamed: 0,Product,Price
5,Notebook,3.0
4,Notebook,3.0
3,Notebook,3.0
2,Notebook,3.0
1,Pen,1.5
0,Book,12.5


In [None]:
## Basic Statistics
# Pandas has built-in statistical functions that help you quickly summarize numerical data.

# Summary Statistics
df.describe()  #    quick overview of the dataset (count, mean, std, min, max)

# Common Aggregation Functions
df['Price'].mean()  # average
df['Price'].sum()  # sum
df['Price'].median()  # median
df['Price'].min()  # min
df['Price'].max()  # max
df['Price'].std()  # standard deviation: a measure of the amount of variation or dispersion of a set of values, indicating how spread out the data is from its mean.

# Counting Unique Values:
df['Product'].value_counts()  # How many times each value appears
df['Product'].nunique()  #  # number of unique values
df['Product'].unique()  #  # list of of unique values

# Apply Statistics to Conditions
df[df['Price'] > 2]['Price'].mean()  # Mean Price of items where Price > 2

# Correlation Between Columns
df.corr(numeric_only=True)  # numeric_only=True to avoid text column errors



Unnamed: 0,Price
Price,1.0


### What is Correlation?

👉 Correlation = A statistical measure that shows:
How two numeric columns move together.

- If they increase/decrease together → positive correlation (closer to +1)
- If one increases while the other decreases → negative correlation (closer to -1)
- If they have no relationship → correlation close to 0
- Example: Taller people tend to weigh more → positive correlation between Height and Weight.

### Standard Deviation (STD):
It measures how spread out the numbers are from the average (mean).

- A small std means values are close together.
- A large std means values are very spread out.

| Prices                       | STD Interpretation        |
| ---------------------------- | ------------------------- |
| 10, 10, 10, 10 → Mean = 10   | STD = 0 (no spread)       |
| 1, 10, 20, 30 → Mean = 15.25 | STD = higher (big spread) |
