<a href="https://colab.research.google.com/github/abdel2ty/IntenseAI_Notebooks_v1/blob/main/notebook1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div style="background-color:#28a745; padding:15px; border-radius:10px; text-align:center; display:flex; justify-content:center; align-items:center; gap:12px;">
    <img src="attachment:5e80712b-9e10-45a1-8362-bebcb6a71eac.jpg" alt="Evergreen Logo" style="width:70px; height:70px; border-radius:50%;">
    <h1 style="color:white; font-size:40px; font-weight:bold; margin:0;">
        Evergreen.ai
    </h1>

<div style="margin-left:auto; padding-right:10px;">
        <h2 style="color:white; font-size:40px; font-weight:600; margin:0;">
            Pandas
        </h2>
</div>
</div>




## **1. What is Pandas?**

Pandas is a **Python library** used for **data analysis** and **data manipulation**.
It provides **fast, flexible, and powerful** tools to work with structured data, especially **tables** (like Excel sheets or SQL tables).

* **Name origin:** *Pandas* = **Pan**el **Da**ta → data in tables.
* Built on top of **NumPy** → works very well with numerical data.

---

## **2. Why Use Pandas?**

Without Pandas, handling large datasets is **complicated**.
Pandas makes it easy to:

* **Read** data (CSV, Excel, SQL, JSON…)
* **Clean** data (remove duplicates, handle missing values…)
* **Analyze** data (find averages, sums, counts…)
* **Visualize** data (with Matplotlib & Seaborn)
* **Export** data (save back to CSV, Excel, SQL…)

Example:
Imagine you have an **Excel sheet** with **10,000+ rows**.
With Pandas, you can process and analyze it in **one or two lines of code**.

---

## **3. How to Install Pandas**

```bash
pip install pandas
```

Then import it:

```python
import pandas as pd
```

*(We always use the alias `pd` for Pandas.)*

---

## **4. Pandas Core Data Structures**

Pandas mainly has **two** key data structures:

| Feature     | **Series** 🟢          | **DataFrame** 🔵 |
| ----------- | ---------------------- | ---------------- |
| Type        | 1D (one column)        | 2D (table)       |
| Similar To  | Single column in Excel | Full Excel sheet |
| Data Labels | Index labels           | Rows + Columns   |
| Best For    | One-dimensional data   | Tabular data     |

---

### **4.1. Pandas Series** → *(1D data)*

Think of it like a **single column** in Excel.

#### **Example**

```python
import pandas as pd

# Creating a Series from a list
data = [10, 20, 30, 40]
s = pd.Series(data)

print(s)
```

**Output**

```
0    10
1    20
2    30
3    40
dtype: int64
```

🔹 Left column → **index** (automatically generated: 0,1,2,3)
🔹 Right column → **values**

#### **Accessing Series data**

```python
print(s[2])       # Get single value → 30
print(s[1:3])     # Get range → index 1 to 2
```

---

### **4.2. Pandas DataFrame** → *(2D data)*

Think of it like an **Excel table**.

#### **Example**

```python
import pandas as pd

data = {
    "Name": ["Ali", "Omar", "Mona"],
    "Age": [25, 30, 22],
    "City": ["Cairo", "Giza", "Alex"]
}

df = pd.DataFrame(data)
print(df)
```

**Output**

```
   Name   Age   City
0   Ali    25   Cairo
1  Omar    30   Giza
2  Mona    22   Alex
```

🔹 Each column → **Series**
🔹 Whole table → **DataFrame**

---

## **5. Common Pandas Operations**

Now that we have a DataFrame, let’s **work with it**:

### **5.1. Read & Write Data**

```python
df = pd.read_csv("data.csv")     # Read CSV file
df.to_excel("output.xlsx")       # Save to Excel
```

---

### **5.2. View Data**

```python
print(df.head())   # First 5 rows
print(df.tail())   # Last 5 rows
print(df.shape)    # (rows, columns)
```

# 🏗️ **Python & Pandas DataFrame**

## **1️⃣ What is a Constructor in Python?**
In **Object-Oriented Programming (OOP)**, a **constructor** is a **special method** used to **create an object** from a **class**.

- A **class** acts like a blueprint.
- A **constructor** is used to **instantiate objects** from that blueprint.
- In Python, the constructor method is usually defined as `__init__()` inside a class.

---

## **2️⃣ Pandas and the DataFrame Constructor**

In the **Pandas** library:
- The **class** is called **`DataFrame`**.
- When we write **`pd.DataFrame()`**, we are **calling the constructor** of the **DataFrame** class.
- This **creates a DataFrame object**, which is a **2D, table-like structure** similar to **Excel** or **SQL tables**.

---




## **3️⃣ Creating a DataFrame Using the Constructor**

### **🔹 Example 1 — From a Dictionary**
```python


In [None]:
import pandas as pd

data = {
    "Name": ["Ali", "Sara", "Omar"],
    "Age": [23, 25, 21],
    "Grade": [88, 92, 79]
}

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Grade
0,Ali,23,88
1,Sara,25,92
2,Omar,21,79


In [None]:
df

Unnamed: 0,Name,Age,Grade
0,Ali,23,88
1,Sara,25,92
2,Omar,21,79


In [None]:
df_users = pd.DataFrame({
    'Name': ["Ahmed", "Mahmoud", "Alaa"],
    'Smoke': [0, 1, 1],
    'Age': [20, 11, 10]
}, index=['User1', 'User1', 'User1'])

df_users


Unnamed: 0,Name,Smoke,Age
User1,Ahmed,0,20
User1,Mahmoud,1,11
User1,Alaa,1,10


# 📌 **Pandas Series**

## **1️⃣ What is a Series?**
In **Pandas**, a **Series** is a **one-dimensional (1D) data structure** that can hold a **sequence of data values**.  
Think of it like:
- A **column** in an **Excel sheet**.
- A **list** but with **labels (index)**.
- The **building block** of a DataFrame.

> If a **DataFrame** is like a **table**, then a **Series** is like a **single column** from that table.

---




## **2️⃣ Creating a Series from a List**
You can create a **Pandas Series** using the **`pd.Series()`** constructor:

```python


In [None]:
import pandas as pd

# Creating a Series from a Python list
data = [10, 20, 30, "40"]
series = pd.Series(data)

series

0    10
1    20
2    30
3    40
dtype: object

# 📂 **Reading Data Files in Pandas**

## **1️⃣ Why Read Data Files?**
While creating **DataFrames** and **Series** by hand is useful for learning,  
**in real-world scenarios**, we usually **don’t manually create datasets**.  
Instead, we work with **existing data files** that are **stored** in different formats.

Some common formats include:
- **CSV** → Comma-Separated Values *(most common)*
- **Excel** → `.xlsx` files
- **JSON** → JavaScript Object Notation
- **SQL Databases**
- **Parquet** and others

Among these, **CSV** is the simplest and most widely used format.

---

## **2️⃣ What is a CSV File?**
A **CSV** (Comma-Separated Values) file stores data **in a plain text table**  
where **columns** are separated by **commas**.

### **Example of a CSV File**
| Product A | Product B | Product C |
|----------|-----------|-----------|
|   30     |    21     |    9      |
|   35     |    34     |    1      |
|   41     |    11     |   11      |



### **Explanation**
- Each **line** represents a **row**.
- Each **value** in a row is separated by a **comma**.
- The **first line** usually contains **column names** *(header)*.

---

## **3️⃣ Reading CSV Files with Pandas**

The easiest way to read a CSV file in **Pandas** is by using the  
**`pd.read_csv()`** function.

### **Example**
```python
import pandas as pd

# Read the CSV file into a DataFrame
df = pd.read_csv("Covid19-counties-2023.csv")

# Display the first 5 rows
print(df.head())


In [None]:
df = pd.read_csv(r"Covid19-counties-2023.csv") #\n
df

Unnamed: 0,date,county,state,fips,cases,deaths
0,1/1/2023,Autauga,Alabama,1001.0,18961,230.0
1,1/1/2023,Baldwin,Alabama,1003.0,67496,719.0
2,1/1/2023,Barbour,Alabama,1005.0,7027,111.0
3,1/1/2023,Bibb,Alabama,1007.0,7692,108.0
4,1/1/2023,Blount,Alabama,1009.0,17731,260.0
...,...,...,...,...,...,...
267004,3/23/2023,Sweetwater,Wyoming,56037.0,12519,139.0
267005,3/23/2023,Teton,Wyoming,56039.0,12150,16.0
267006,3/23/2023,Uinta,Wyoming,56041.0,6416,43.0
267007,3/23/2023,Washakie,Wyoming,56043.0,2700,51.0


# 🐍 **Understanding the `r` Prefix in Python**

The **`r` prefix** in Python stands for **"raw string"**.  
It tells Python **not to treat backslashes (`\`) as escape characters**, which is especially useful when working with **Windows file paths**, **regular expressions**, or **strings containing many backslashes**.

---

## **1️⃣ What is the `r` Prefix?**

### 🔹 **Definition**
- `r"..."` → **Raw string literal**.
- Prevents Python from interpreting escape sequences like `\n`, `\t`, or `\U`.

### 🔹 **Why It's Useful**
- **File paths** → Avoids errors when working with **Windows-style paths**.
- **Regex patterns** → Keeps backslashes intact.
- **Complex strings** → Helps when multiple `\` are involved.

---

## **2️⃣ Without `r` — Normal String Behavior**

```python



In [None]:
path = "C:\new_folder\data.csv"
print(path)

C:
ew_folder\data.csv


In [None]:
path = r"C:\new_folder\data.csv"
print(path)

C:\new_folder\data.csv


In [None]:
path = "C:/new_folder/data.csv"
print(path)

In [None]:
df = pd.read_csv(r"C:\Users\Nima\Downloads\data Analysis\New folder\Covid19-counties-2023.csv")
df

Unnamed: 0,date,county,state,fips,cases,deaths
0,1/1/2023,Autauga,Alabama,1001.0,18961,230.0
1,1/1/2023,Baldwin,Alabama,1003.0,67496,719.0
2,1/1/2023,Barbour,Alabama,1005.0,7027,111.0
3,1/1/2023,Bibb,Alabama,1007.0,7692,108.0
4,1/1/2023,Blount,Alabama,1009.0,17731,260.0
...,...,...,...,...,...,...
267004,3/23/2023,Sweetwater,Wyoming,56037.0,12519,139.0
267005,3/23/2023,Teton,Wyoming,56039.0,12150,16.0
267006,3/23/2023,Uinta,Wyoming,56041.0,6416,43.0
267007,3/23/2023,Washakie,Wyoming,56043.0,2700,51.0


In [None]:
df = pd.read_csv("C:\Users\Nima\Downloads\data Analysis\New folder\Covid19-counties-2023.csv")
df

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape (4112858292.py, line 1)

In [None]:
df = pd.read_csv("C:/Users/Nima/Downloads/data Analysis/New folder/Covid19-counties-2023.csv")
df

Unnamed: 0,date,county,state,fips,cases,deaths
0,1/1/2023,Autauga,Alabama,1001.0,18961,230.0
1,1/1/2023,Baldwin,Alabama,1003.0,67496,719.0
2,1/1/2023,Barbour,Alabama,1005.0,7027,111.0
3,1/1/2023,Bibb,Alabama,1007.0,7692,108.0
4,1/1/2023,Blount,Alabama,1009.0,17731,260.0
...,...,...,...,...,...,...
267004,3/23/2023,Sweetwater,Wyoming,56037.0,12519,139.0
267005,3/23/2023,Teton,Wyoming,56039.0,12150,16.0
267006,3/23/2023,Uinta,Wyoming,56041.0,6416,43.0
267007,3/23/2023,Washakie,Wyoming,56043.0,2700,51.0


| **Example**                 | **Behavior**                  | **Recommended?** |
| --------------------------- | ----------------------------- | ---------------- |
| `"C:\new_folder\data.csv"`  | ❌ Wrong → `\n` causes newline | No               |
| `r"C:\new_folder\data.csv"` | ✅ Correct path                | Yes              |
| `"C:/new_folder/data.csv"`  | ✅ Works fine too              | Yes              |



# 🐼 **Pandas: `head()`, `tail()`, and `shape`**

When working with **pandas DataFrames**, it's important to **inspect** and **understand** your dataset quickly.  
Three commonly used tools for this are: **`head()`**, **`tail()`**, and **`shape`**.

---

## **1️⃣ `df.head()` → View the First Rows**

### 🔹 **Definition**
- The **`head()`** method **returns the first 5 rows** of a DataFrame **by default**.
- You can specify **how many rows** you want to see.

### 🔹 **Syntax**
```python
df.head(n)
````

* **`n`** → Number of rows to display (**default = 5**).

### 🔹 **Example**

```python

```

---


In [None]:
import pandas as pd

df = pd.read_csv("Covid19-counties-2023.csv")
df.head(10)

Unnamed: 0,date,county,state,fips,cases,deaths
0,1/1/2023,Autauga,Alabama,1001.0,18961,230.0
1,1/1/2023,Baldwin,Alabama,1003.0,67496,719.0
2,1/1/2023,Barbour,Alabama,1005.0,7027,111.0
3,1/1/2023,Bibb,Alabama,1007.0,7692,108.0
4,1/1/2023,Blount,Alabama,1009.0,17731,260.0
5,1/1/2023,Bullock,Alabama,1011.0,2886,54.0
6,1/1/2023,Butler,Alabama,1013.0,6185,130.0
7,1/1/2023,Calhoun,Alabama,1015.0,39458,665.0
8,1/1/2023,Chambers,Alabama,1017.0,10311,174.0
9,1/1/2023,Cherokee,Alabama,1019.0,6456,133.0


In [None]:
df.head(10)   # Show first 10 rows

Unnamed: 0,date,county,state,fips,cases,deaths
0,1/1/2023,Autauga,Alabama,1001.0,18961,230.0
1,1/1/2023,Baldwin,Alabama,1003.0,67496,719.0
2,1/1/2023,Barbour,Alabama,1005.0,7027,111.0
3,1/1/2023,Bibb,Alabama,1007.0,7692,108.0
4,1/1/2023,Blount,Alabama,1009.0,17731,260.0
5,1/1/2023,Bullock,Alabama,1011.0,2886,54.0
6,1/1/2023,Butler,Alabama,1013.0,6185,130.0
7,1/1/2023,Calhoun,Alabama,1015.0,39458,665.0
8,1/1/2023,Chambers,Alabama,1017.0,10311,174.0
9,1/1/2023,Cherokee,Alabama,1019.0,6456,133.0



## **2️⃣ `df.tail()` → View the Last Rows**

### 🔹 **Definition**

* The **`tail()`** method **returns the last 5 rows** of a DataFrame **by default**.
* Like **`head()`**, you can pass a number to specify how many rows you want.

### 🔹 **Syntax**

```python
df.tail(n)
```

* **`n`** → Number of rows to display (**default = 5**).

### 🔹 **Example**


In [None]:
df.tail()

Unnamed: 0,date,county,state,fips,cases,deaths
267004,3/23/2023,Sweetwater,Wyoming,56037.0,12519,139.0
267005,3/23/2023,Teton,Wyoming,56039.0,12150,16.0
267006,3/23/2023,Uinta,Wyoming,56041.0,6416,43.0
267007,3/23/2023,Washakie,Wyoming,56043.0,2700,51.0
267008,3/23/2023,Weston,Wyoming,56045.0,1906,23.0


In [None]:
df.tail(3)   # Show last 3 rows

Unnamed: 0,date,county,state,fips,cases,deaths
267006,3/23/2023,Uinta,Wyoming,56041.0,6416,43.0
267007,3/23/2023,Washakie,Wyoming,56043.0,2700,51.0
267008,3/23/2023,Weston,Wyoming,56045.0,1906,23.0


In [None]:
df

Unnamed: 0,date,county,state,fips,cases,deaths
0,1/1/2023,Autauga,Alabama,1001.0,18961,230.0
1,1/1/2023,Baldwin,Alabama,1003.0,67496,719.0
2,1/1/2023,Barbour,Alabama,1005.0,7027,111.0
3,1/1/2023,Bibb,Alabama,1007.0,7692,108.0
4,1/1/2023,Blount,Alabama,1009.0,17731,260.0
...,...,...,...,...,...,...
267004,3/23/2023,Sweetwater,Wyoming,56037.0,12519,139.0
267005,3/23/2023,Teton,Wyoming,56039.0,12150,16.0
267006,3/23/2023,Uinta,Wyoming,56041.0,6416,43.0
267007,3/23/2023,Washakie,Wyoming,56043.0,2700,51.0



---

## **3️⃣ `df.shape` → Get Dataset Dimensions**

### 🔹 **Definition**

* The **`shape`** attribute tells you the **number of rows** and **columns** in the DataFrame.
* Returns a **tuple**:
  **`(number_of_rows, number_of_columns)`**

### 🔹 **Syntax**

```python
df.shape
```

### 🔹 **Example**

```python
df.shape
```

**Output:**

```
(3143, 4)
```

This means:

* **267009 rows** (observations)
* **6 columns** (features)

---

## **4️⃣ Quick Comparison Table**

| **Method / Attribute** | **Purpose**            | **Default** | **Returns**    | **Example**   |
| ---------------------- | ---------------------- | ----------- | -------------- | ------------- |
| `df.head()`            | Shows **first n rows** | 5 rows      | DataFrame      | `df.head(10)` |
| `df.tail()`            | Shows **last n rows**  | 5 rows      | DataFrame      | `df.tail(3)`  |
| `df.shape`             | Shows **dataset size** | N/A         | Tuple `(r, c)` | `df.shape`    |

---

## **📌 Summary**

* ✅ Use **`df.head()`** → To **peek** at the start of the dataset.
* ✅ Use **`df.tail()`** → To **check the end** of the dataset.
* ✅ Use **`df.shape`** → To **understand dataset size**.

These functions are essential when **exploring datasets** in **pandas**.

---


In [None]:
df.shape

(267009, 6)

<h3>🎥 Record</h3>

[Watch on YouTube](https://youtu.be/_KpMRk7eO_E?si=Lexie3tuS_TT3EaF)


<h3>🎥 Data </h3>

[Data Link ](https://drive.google.com/file/d/1L98bSjf9_PU9Iwzs8HUcCox3Un3mmSBr/view?usp=drive_link)

<div style="background-color:#28a745; padding:15px; border-radius:10px; text-align:center; display:flex; justify-content:center; align-items:center; gap:12px;">
    <img src="attachment:5e80712b-9e10-45a1-8362-bebcb6a71eac.jpg" alt="Evergreen Logo" style="width:70px; height:70px; border-radius:50%;">
    <h1 style="color:white; font-size:40px; font-weight:bold; margin:0;">
        Evergreen.ai
    </h1>

<div style="margin-left:auto; padding-right:10px;">
        <h2 style="color:white; font-size:30px; font-weight:600; margin:0;">
            Eng / Mahmoud Talaat ->
            Thanks
        </h2>
        <h2 style="color:white; font-size:30px; font-weight:600; margin:0;">
            01146544662
        </h2>
</div>
</div>
