<a href="https://colab.research.google.com/github/abdel2ty/IntenseAI_Notebooks_v1/blob/main/pandas_notebook2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div style="background-color:#28a745; padding:15px; border-radius:10px; text-align:center; display:flex; justify-content:center; align-items:center; gap:12px;">
    <img src="attachment:b07ea984-a5b2-4f0c-b2db-25fb9fce14ba.jpg" alt="Evergreen Logo" style="width:70px; height:70px; border-radius:50%;">
    <h1 style="color:white; font-size:40px; font-weight:bold; margin:0;">
        Evergreen.ai
    </h1>

<div style="margin-left:auto; padding-right:10px;">
        <h2 style="color:white; font-size:40px; font-weight:600; margin:0;">
            Pandas
        </h2>
</div>
</div>



### **Explanation**
- Each **line** represents a **row**.
- Each **value** in a row is separated by a **comma**.
- The **first line** usually contains **column names** *(header)*.

---

## **3️⃣ Reading CSV Files with Pandas**

The easiest way to read a CSV file in **Pandas** is by using the  
**`pd.read_csv()`** function.

### **Example**


In [None]:
import pandas as pd

# Read the CSV file into a DataFrame
df = pd.read_csv("Covid19-counties-2023.csv")

# Display the first 5 rows
df.head()

Unnamed: 0,date,county,state,fips,cases,deaths
0,1/1/2023,Autauga,Alabama,1001.0,18961,230.0
1,1/1/2023,Baldwin,Alabama,1003.0,67496,719.0
2,1/1/2023,Barbour,Alabama,1005.0,7027,111.0
3,1/1/2023,Bibb,Alabama,1007.0,7692,108.0
4,1/1/2023,Blount,Alabama,1009.0,17731,260.0


In [None]:
df.county

0            Autauga
1            Baldwin
2            Barbour
3               Bibb
4             Blount
             ...    
267004    Sweetwater
267005         Teton
267006         Uinta
267007      Washakie
267008        Weston
Name: county, Length: 267009, dtype: object

In [None]:
df["county"]

0            Autauga
1            Baldwin
2            Barbour
3               Bibb
4             Blount
             ...    
267004    Sweetwater
267005         Teton
267006         Uinta
267007      Washakie
267008        Weston
Name: county, Length: 267009, dtype: object

In [None]:
df["county"][0]

'Autauga'

# 🐼 **Accessing Data in Pandas: Attributes, Columns, and Values**

In **Python**, objects have **properties** (also called **attributes**).
For example:

```python
book.title
```

Here:

* `book` → object
* `title` → property of the object

Pandas **DataFrames** behave similarly because columns can be treated **like object properties** in some cases.

---

## **1️⃣ Accessing a Column in a DataFrame**

There are **two main ways** to access a column (Series) from a **DataFrame**:

### **Method 1 — Using Dot Notation (`df.column`)**

```python
df.country
```

* Short and clean syntax.
* Works only **if the column name**:

  * Has **no spaces**.
  * Has **no special characters**.
  * Is **not the same as a Pandas method name**.

---

### **Method 2 — Using Bracket Notation (`df["column"]`)** ✅ **Recommended**

```python
df["country"]
```

* More **powerful and flexible**.
* Works even if:

  * The column name has **spaces** → `df["country providence"]`
  * The column name has **special characters**.
  * The column name conflicts with **method names**.

---



## **2️⃣ Accessing a Single Value in a Column**

Once you select a column, Pandas returns a **Series** (1D data).
You can then select a **specific value** using indexing.

### **Example**

```

In [None]:
df["county"][0]

'Autauga'



* `df["country"]` → returns the **Series** containing all country names.
* `[0]` → selects the **first value** in that Series.




```python
df.country providence   # ❌ Invalid syntax
df["country providence"]  # ✅ Works fine


# 🐼 **Pandas Indexing Cheat Sheet**

### **Mastering `.loc`, `.iloc`, `.at`, and `.iat`**

Pandas provides **four main techniques** for selecting **rows**, **columns**, and **specific values** from a **DataFrame**.

---

## **1️⃣ Using `.loc` → Label-Based Indexing**

### **📌 What it does**

* Selects **rows and columns** **by their labels** (names).
* Can be used with:

  * **Single labels**
  * **Lists of labels**
  * **Label ranges**
  * **Boolean masks**

### **📌 Syntax**

```python
df.loc[rows, columns]
```

### **📌 Examples**

#### **a) Select all columns for a single row by label**






In [None]:
df.loc[0]

date      1/1/2023
county     Autauga
state      Alabama
fips        1001.0
cases        18961
deaths       230.0
Name: 0, dtype: object

 **b) Select a single column for a single row**

In [None]:
df.loc[0, "county"]

'Autauga'

#### **c) Select multiple columns for a single row**

In [None]:
df.loc[0, ["date", "cases", "deaths"]]

date      1/1/2023
cases        18961
deaths       230.0
Name: 0, dtype: object

#### **d) Select multiple rows by index labels**


In [None]:
df.loc[[0, 3, 5], ["county", "state", "cases"]]

Unnamed: 0,county,state,cases
0,Autauga,Alabama,18961
3,Bibb,Alabama,7692
5,Bullock,Alabama,2886


#### **e) Select rows using a label range**

In [None]:
df.loc[0:5, ["date", "cases"]]

Unnamed: 0,date,cases
0,1/1/2023,18961
1,1/1/2023,67496
2,1/1/2023,7027
3,1/1/2023,7692
4,1/1/2023,17731
5,1/1/2023,2886


## **2️⃣ Using `.iloc` → Position-Based Indexing**

### **📌 What it does**

* Selects **rows and columns** **by their integer position** (like Python lists).
* Cannot use column names directly, only **numeric indices**.

### **📌 Syntax**

```python
df.iloc[rows, columns]
```

### **📌 Examples**
                                                              
#### **a) Select the first row**



In [None]:
df.iloc[0]

date      1/1/2023
county     Autauga
state      Alabama
fips        1001.0
cases        18961
deaths       230.0
Name: 0, dtype: object

#### **b) Select the first value in the "county" column**

In [None]:
df.iloc[0, 1]

'Autauga'

#### **c) Select a range of rows**

In [None]:
df.iloc[0:5]

Unnamed: 0,date,county,state,fips,cases,deaths
0,1/1/2023,Autauga,Alabama,1001.0,18961,230.0
1,1/1/2023,Baldwin,Alabama,1003.0,67496,719.0
2,1/1/2023,Barbour,Alabama,1005.0,7027,111.0
3,1/1/2023,Bibb,Alabama,1007.0,7692,108.0
4,1/1/2023,Blount,Alabama,1009.0,17731,260.0


#### **d) Select multiple rows and columns**


In [None]:
df.iloc[[0, 2, 5], [0, 1, 4]]

Unnamed: 0,date,county,cases
0,1/1/2023,Autauga,18961
2,1/1/2023,Barbour,7027
5,1/1/2023,Bullock,2886



## **3️⃣ Using `.at` → Fast Access by Label**

### **📌 What it does**

* Retrieves a **single scalar value** using **row label** + **column label**.
* Much **faster** than `.loc` because it's optimized for **single-cell access**.

### **📌 Syntax**

```python
df.at[row_label, column_label]
```

⚡ **Performance Tip:** Use `.at` instead of `.loc` when accessing **only one value**.

---

## **4️⃣ Using `.iat` → Fast Access by Position**

### **📌 What it does**

* Retrieves a **single scalar value** using **row index** + **column index**.
* Similar to `.at`, but uses **integer-based positions**.

### **📌 Syntax**

```python
df.iat[row_index, column_index]
```

### **📌 Example**


In [None]:
df.at[2, "cases"]

np.int64(7027)

In [None]:
df.iat[0, 4]

np.int64(18961)


## **5️⃣ Comparison Table**

| **Feature**                    | **.loc** 🟢                | **.iloc** 🔵               | **.at** 🟡            | **.iat** 🟣              |
| ------------------------------ | -------------------------- | -------------------------- | --------------------- | ------------------------ |
| **Index type**                 | Labels (names)             | Integer positions          | Labels                | Integer positions        |
| **Returns**                    | DataFrame / Series / Value | DataFrame / Series / Value | **Single Value**      | **Single Value**         |
| **Supports multiple rows?**    | ✅ Yes                      | ✅ Yes                      | ❌ No                  | ❌ No                     |
| **Supports multiple columns?** | ✅ Yes                      | ✅ Yes                      | ❌ No                  | ❌ No                     |
| **Speed**                      | Slowest                    | Faster than `.loc`         | **Fastest**           | **Fastest**              |
| **Use case**                   | Label-based slicing        | Position-based slicing     | Single value by label | Single value by position |

---


## **7️⃣ Summary**

* Use **`.loc`** → When selecting **rows/columns by names**.
* Use **`.iloc`** → When selecting **rows/columns by positions**.
* Use **`.at`** → When getting **one value** **by label** (fastest).
* Use **`.iat`** → When getting **one value** **by position** (fastest).


In [None]:
df.loc[df["county"] == "Weston"]

Unnamed: 0,date,county,state,fips,cases,deaths
3254,1/1/2023,Weston,Wyoming,56045.0,1880,22.0
6509,1/2/2023,Weston,Wyoming,56045.0,1880,22.0
9763,1/3/2023,Weston,Wyoming,56045.0,1880,22.0
13015,1/4/2023,Weston,Wyoming,56045.0,1880,22.0
16267,1/5/2023,Weston,Wyoming,56045.0,1880,22.0
...,...,...,...,...,...,...
253982,3/19/2023,Weston,Wyoming,56045.0,1906,23.0
257238,3/20/2023,Weston,Wyoming,56045.0,1906,23.0
260493,3/21/2023,Weston,Wyoming,56045.0,1906,23.0
263751,3/22/2023,Weston,Wyoming,56045.0,1906,23.0


## **1️⃣ Filtering**
Pandas lets us filter rows by creating a **boolean mask**.

### **Example — Select rows where cases > 1900**
**Explanation**:

* `df["cases"] > 1900` → Returns a **Series of True/False values**.
* Rows where the condition is **True** are kept.


In [None]:
df[df["cases"] > 1900]

Unnamed: 0,date,county,state,fips,cases,deaths
0,1/1/2023,Autauga,Alabama,1001.0,18961,230.0
1,1/1/2023,Baldwin,Alabama,1003.0,67496,719.0
2,1/1/2023,Barbour,Alabama,1005.0,7027,111.0
3,1/1/2023,Bibb,Alabama,1007.0,7692,108.0
4,1/1/2023,Blount,Alabama,1009.0,17731,260.0
...,...,...,...,...,...,...
267004,3/23/2023,Sweetwater,Wyoming,56037.0,12519,139.0
267005,3/23/2023,Teton,Wyoming,56039.0,12150,16.0
267006,3/23/2023,Uinta,Wyoming,56041.0,6416,43.0
267007,3/23/2023,Washakie,Wyoming,56043.0,2700,51.0


## **2️⃣ Combining Multiple Conditions**

To combine multiple conditions, we use:

* **`&`** → AND
* **`|`** → OR
* **`~`** → NOT

### **Example — Select rows where:**

* Cases **> 1900**
* AND Deaths **> 22**




In [None]:
df[(df["cases"] > 1900) & (df["deaths"] > 22)]

Unnamed: 0,date,county,state,fips,cases,deaths
0,1/1/2023,Autauga,Alabama,1001.0,18961,230.0
1,1/1/2023,Baldwin,Alabama,1003.0,67496,719.0
2,1/1/2023,Barbour,Alabama,1005.0,7027,111.0
3,1/1/2023,Bibb,Alabama,1007.0,7692,108.0
4,1/1/2023,Blount,Alabama,1009.0,17731,260.0
...,...,...,...,...,...,...
267003,3/23/2023,Sublette,Wyoming,56035.0,2324,28.0
267004,3/23/2023,Sweetwater,Wyoming,56037.0,12519,139.0
267006,3/23/2023,Uinta,Wyoming,56041.0,6416,43.0
267007,3/23/2023,Washakie,Wyoming,56043.0,2700,51.0


In [None]:
df[(df["cases"] > 20000) | (df["deaths"] > 3000)]

Unnamed: 0,date,county,state,fips,cases,deaths
1,1/1/2023,Baldwin,Alabama,1003.0,67496,719.0
7,1/1/2023,Calhoun,Alabama,1015.0,39458,665.0
16,1/1/2023,Colbert,Alabama,1033.0,20165,275.0
21,1/1/2023,Cullman,Alabama,1043.0,30119,391.0
24,1/1/2023,DeKalb,Alabama,1049.0,21738,350.0
...,...,...,...,...,...,...
266981,3/23/2023,Waukesha,Wisconsin,55133.0,140273,1251.0
266984,3/23/2023,Winnebago,Wisconsin,55139.0,64184,418.0
266985,3/23/2023,Wood,Wisconsin,55141.0,27176,269.0
266996,3/23/2023,Laramie,Wyoming,56021.0,31702,322.0


In [None]:
df[~(df["state"] == "Wyoming")]

Unnamed: 0,date,county,state,fips,cases,deaths
0,1/1/2023,Autauga,Alabama,1001.0,18961,230.0
1,1/1/2023,Baldwin,Alabama,1003.0,67496,719.0
2,1/1/2023,Barbour,Alabama,1005.0,7027,111.0
3,1/1/2023,Bibb,Alabama,1007.0,7692,108.0
4,1/1/2023,Blount,Alabama,1009.0,17731,260.0
...,...,...,...,...,...,...
266981,3/23/2023,Waukesha,Wisconsin,55133.0,140273,1251.0
266982,3/23/2023,Waupaca,Wisconsin,55135.0,16431,266.0
266983,3/23/2023,Waushara,Wisconsin,55137.0,7672,94.0
266984,3/23/2023,Winnebago,Wisconsin,55139.0,64184,418.0



## **3️⃣ Using `.isin()`**

The **`.isin()`** method selects rows where column values **exist in a given list**.

### **Example — Select rows where state is either Wyoming or Texas**





In [None]:
df[df["state"].isin(["Wyoming", "Texas"])]

Unnamed: 0,date,county,state,fips,cases,deaths
2628,1/1/2023,Anderson,Texas,48001.0,10771,252.0
2629,1/1/2023,Andrews,Texas,48003.0,4901,73.0
2630,1/1/2023,Angelina,Texas,48005.0,16747,497.0
2631,1/1/2023,Aransas,Texas,48007.0,5378,98.0
2632,1/1/2023,Archer,Texas,48009.0,2563,32.0
...,...,...,...,...,...,...
267004,3/23/2023,Sweetwater,Wyoming,56037.0,12519,139.0
267005,3/23/2023,Teton,Wyoming,56039.0,12150,16.0
267006,3/23/2023,Uinta,Wyoming,56041.0,6416,43.0
267007,3/23/2023,Washakie,Wyoming,56043.0,2700,51.0


In [None]:
df[~df["state"].isin(["Wyoming", "Texas"])]

Unnamed: 0,date,county,state,fips,cases,deaths
0,1/1/2023,Autauga,Alabama,1001.0,18961,230.0
1,1/1/2023,Baldwin,Alabama,1003.0,67496,719.0
2,1/1/2023,Barbour,Alabama,1005.0,7027,111.0
3,1/1/2023,Bibb,Alabama,1007.0,7692,108.0
4,1/1/2023,Blount,Alabama,1009.0,17731,260.0
...,...,...,...,...,...,...
266981,3/23/2023,Waukesha,Wisconsin,55133.0,140273,1251.0
266982,3/23/2023,Waupaca,Wisconsin,55135.0,16431,266.0
266983,3/23/2023,Waushara,Wisconsin,55137.0,7672,94.0
266984,3/23/2023,Winnebago,Wisconsin,55139.0,64184,418.0


## **4️⃣ Using `.isnull()` and `.notnull()`**

These methods help us handle **missing data (NaN)**.

### **Example — Select rows where `fips` is missing**

In [None]:
df[df["fips"].isnull()]

Unnamed: 0,date,county,state,fips,cases,deaths
63,1/1/2023,Unknown,Alabama,,4703,0.0
91,1/1/2023,Unknown,Alaska,,1822,0.0
96,1/1/2023,Unknown,American Samoa,,8266,34.0
182,1/1/2023,Unknown,Arkansas,,32705,0.0
317,1/1/2023,Unknown,Connecticut,,3159,9.0
...,...,...,...,...,...,...
266611,3/23/2023,Unknown,Texas,,10766,0.0
266661,3/23/2023,Unknown,Utah,,5953,97.0
266678,3/23/2023,Unknown,Vermont,,3208,22.0
266808,3/23/2023,Unknown,Virginia,,894,0.0


In [None]:
df[df["fips"].notnull()]

Unnamed: 0,date,county,state,fips,cases,deaths
0,1/1/2023,Autauga,Alabama,1001.0,18961,230.0
1,1/1/2023,Baldwin,Alabama,1003.0,67496,719.0
2,1/1/2023,Barbour,Alabama,1005.0,7027,111.0
3,1/1/2023,Bibb,Alabama,1007.0,7692,108.0
4,1/1/2023,Blount,Alabama,1009.0,17731,260.0
...,...,...,...,...,...,...
267004,3/23/2023,Sweetwater,Wyoming,56037.0,12519,139.0
267005,3/23/2023,Teton,Wyoming,56039.0,12150,16.0
267006,3/23/2023,Uinta,Wyoming,56041.0,6416,43.0
267007,3/23/2023,Washakie,Wyoming,56043.0,2700,51.0


## **5️⃣ Conditional Selection with Multiple Columns**

We can filter using conditions on **different columns**.

### **Example — Select rows where:**

* State = **Wyoming**
* Cases > **1880**
* Deaths ≤ **22**

In [None]:
df[(df["state"] == "Wyoming") & (df["cases"] > 1880) & (df["deaths"] <= 22)]

Unnamed: 0,date,county,state,fips,cases,deaths
3241,1/1/2023,Johnson,Wyoming,56019.0,2310,21.0
3251,1/1/2023,Teton,Wyoming,56039.0,12010,16.0
6496,1/2/2023,Johnson,Wyoming,56019.0,2310,21.0
6506,1/2/2023,Teton,Wyoming,56039.0,12010,16.0
9750,1/3/2023,Johnson,Wyoming,56019.0,2314,21.0
...,...,...,...,...,...,...
260490,3/21/2023,Teton,Wyoming,56039.0,12150,16.0
263738,3/22/2023,Johnson,Wyoming,56019.0,2388,21.0
263748,3/22/2023,Teton,Wyoming,56039.0,12150,16.0
266995,3/23/2023,Johnson,Wyoming,56019.0,2388,21.0


## **7️⃣ Performance Tip — Use `.query()`**

For **cleaner syntax**, you can use `.query()`.

### **Example**

In [None]:
df.query("state == 'Wyoming' and cases > 1900 and deaths > 22")


Unnamed: 0,date,county,state,fips,cases,deaths
3232,1/1/2023,Albany,Wyoming,56001.0,11416,54.0
3233,1/1/2023,Big Horn,Wyoming,56003.0,3046,64.0
3234,1/1/2023,Campbell,Wyoming,56005.0,13666,156.0
3235,1/1/2023,Carbon,Wyoming,56007.0,4975,55.0
3236,1/1/2023,Converse,Wyoming,56009.0,3640,57.0
...,...,...,...,...,...,...
267003,3/23/2023,Sublette,Wyoming,56035.0,2324,28.0
267004,3/23/2023,Sweetwater,Wyoming,56037.0,12519,139.0
267006,3/23/2023,Uinta,Wyoming,56041.0,6416,43.0
267007,3/23/2023,Washakie,Wyoming,56043.0,2700,51.0




## **9️⃣ Final Key Takeaways**

* Use **boolean masks** for simple conditions.
* Use **`&`** and **`|`** to combine conditions.
* Use **`.isin()`** when filtering against multiple values.
* Use **`.isnull()`** & **`.notnull()`** to handle missing data.
* Use **`.query()`** for **cleaner and faster code**.


In [None]:
df.loc[df["cases"] > 1500, ["date", "state", "cases"]]

Unnamed: 0,date,state,cases
0,1/1/2023,Alabama,18961
1,1/1/2023,Alabama,67496
2,1/1/2023,Alabama,7027
3,1/1/2023,Alabama,7692
4,1/1/2023,Alabama,17731
...,...,...,...
267004,3/23/2023,Wyoming,12519
267005,3/23/2023,Wyoming,12150
267006,3/23/2023,Wyoming,6416
267007,3/23/2023,Wyoming,2700


In [None]:
df.loc[df["state"].isin(["Texas", "Florida"]), ["date", "state", "cases"]]

Unnamed: 0,date,state,cases
324,1/1/2023,Florida,85571
325,1/1/2023,Florida,10410
326,1/1/2023,Florida,53685
327,1/1/2023,Florida,8859
328,1/1/2023,Florida,168824
...,...,...,...
266632,3/23/2023,Texas,8343
266633,3/23/2023,Texas,1262
266634,3/23/2023,Texas,4480
266635,3/23/2023,Texas,4411


In [None]:
df.loc[~df["state"].isin(["Wyoming", "Texas"]), ["date", "state", "cases"]]


Unnamed: 0,date,state,cases
0,1/1/2023,Alabama,18961
1,1/1/2023,Alabama,67496
2,1/1/2023,Alabama,7027
3,1/1/2023,Alabama,7692
4,1/1/2023,Alabama,17731
...,...,...,...
266981,3/23/2023,Wisconsin,140273
266982,3/23/2023,Wisconsin,16431
266983,3/23/2023,Wisconsin,7672
266984,3/23/2023,Wisconsin,64184


In [None]:
df.loc[df["deaths"].isnull(), ["date", "state", "deaths"]]

Unnamed: 0,date,state,deaths
2335,1/1/2023,Puerto Rico,
2336,1/1/2023,Puerto Rico,
2337,1/1/2023,Puerto Rico,
2338,1/1/2023,Puerto Rico,
2339,1/1/2023,Puerto Rico,
...,...,...,...
266162,3/23/2023,Puerto Rico,
266163,3/23/2023,Puerto Rico,
266164,3/23/2023,Puerto Rico,
266165,3/23/2023,Puerto Rico,


In [None]:
df.loc[df["deaths"].notnull(), ["date", "state", "deaths"]]

Unnamed: 0,date,state,deaths
0,1/1/2023,Alabama,230.0
1,1/1/2023,Alabama,719.0
2,1/1/2023,Alabama,111.0
3,1/1/2023,Alabama,108.0
4,1/1/2023,Alabama,260.0
...,...,...,...
267004,3/23/2023,Wyoming,139.0
267005,3/23/2023,Wyoming,16.0
267006,3/23/2023,Wyoming,43.0
267007,3/23/2023,Wyoming,51.0




# 🐼 **Pandas — Assigning Data to a DataFrame**

Assigning data in **Pandas** is very simple and powerful. You can:

* Assign **constant values** to entire columns.
* Assign **values based on conditions**.
* Assign **values from another column**.
* Create **new columns** or **update existing ones**.

In [None]:
df["county"][0]

'Autauga'

In [None]:
df["county"][0] = "Giza"

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["county"][0] = "Giza"


## **1️⃣ Assign a Constant Value to a New Column**

In [None]:
df["county"] = "USA"

In [None]:
df

Unnamed: 0,date,county,state,fips,cases,deaths
0,1/1/2023,USA,Alabama,1001.0,18961,230.0
1,1/1/2023,USA,Alabama,1003.0,67496,719.0
2,1/1/2023,USA,Alabama,1005.0,7027,111.0
3,1/1/2023,USA,Alabama,1007.0,7692,108.0
4,1/1/2023,USA,Alabama,1009.0,17731,260.0
...,...,...,...,...,...,...
267004,3/23/2023,USA,Wyoming,56037.0,12519,139.0
267005,3/23/2023,USA,Wyoming,56039.0,12150,16.0
267006,3/23/2023,USA,Wyoming,56041.0,6416,43.0
267007,3/23/2023,USA,Wyoming,56043.0,2700,51.0





## **2️⃣ Assign Values Based on a Condition**

Let's add a new column `"high_cases"`:

* If **cases > 1500**, set **"Yes"**.
* Otherwise, set **"No"**.






In [None]:
df["high_cases"] = df["cases"] > 1500

In [None]:
df

Unnamed: 0,date,county,state,fips,cases,deaths,high_cases
0,1/1/2023,USA,Alabama,1001.0,18961,230.0,True
1,1/1/2023,USA,Alabama,1003.0,67496,719.0,True
2,1/1/2023,USA,Alabama,1005.0,7027,111.0,True
3,1/1/2023,USA,Alabama,1007.0,7692,108.0,True
4,1/1/2023,USA,Alabama,1009.0,17731,260.0,True
...,...,...,...,...,...,...,...
267004,3/23/2023,USA,Wyoming,56037.0,12519,139.0,True
267005,3/23/2023,USA,Wyoming,56039.0,12150,16.0,True
267006,3/23/2023,USA,Wyoming,56041.0,6416,43.0,True
267007,3/23/2023,USA,Wyoming,56043.0,2700,51.0,True


In [None]:
import numpy as np
df["high_cases"] = np.where(df["cases"] > 1500, "Yes", "No")

In [None]:
df

Unnamed: 0,date,county,state,fips,cases,deaths,high_cases
0,1/1/2023,USA,Alabama,1001.0,18961,230.0,Yes
1,1/1/2023,USA,Alabama,1003.0,67496,719.0,Yes
2,1/1/2023,USA,Alabama,1005.0,7027,111.0,Yes
3,1/1/2023,USA,Alabama,1007.0,7692,108.0,Yes
4,1/1/2023,USA,Alabama,1009.0,17731,260.0,Yes
...,...,...,...,...,...,...,...
267004,3/23/2023,USA,Wyoming,56037.0,12519,139.0,Yes
267005,3/23/2023,USA,Wyoming,56039.0,12150,16.0,Yes
267006,3/23/2023,USA,Wyoming,56041.0,6416,43.0,Yes
267007,3/23/2023,USA,Wyoming,56043.0,2700,51.0,Yes


## **3️⃣ Assign Values from Another Column**
You can create a **copy of an existing column**:

In [None]:
df["cases_copy"] = df["cases"]

In [None]:
df

Unnamed: 0,date,county,state,fips,cases,deaths,high_cases,cases_copy
0,1/1/2023,USA,Alabama,1001.0,18961,230.0,Yes,18961
1,1/1/2023,USA,Alabama,1003.0,67496,719.0,Yes,67496
2,1/1/2023,USA,Alabama,1005.0,7027,111.0,Yes,7027
3,1/1/2023,USA,Alabama,1007.0,7692,108.0,Yes,7692
4,1/1/2023,USA,Alabama,1009.0,17731,260.0,Yes,17731
...,...,...,...,...,...,...,...,...
267004,3/23/2023,USA,Wyoming,56037.0,12519,139.0,Yes,12519
267005,3/23/2023,USA,Wyoming,56039.0,12150,16.0,Yes,12150
267006,3/23/2023,USA,Wyoming,56041.0,6416,43.0,Yes,6416
267007,3/23/2023,USA,Wyoming,56043.0,2700,51.0,Yes,2700



Or you can **calculate new columns** from existing ones.
For example, add a **mortality rate** column:

In [None]:
df["mortality_rate"] = df["deaths"] / df["cases"]

In [None]:
df

Unnamed: 0,date,county,state,fips,cases,deaths,high_cases,cases_copy,mortality_rate
0,1/1/2023,USA,Alabama,1001.0,18961,230.0,Yes,18961,0.012130
1,1/1/2023,USA,Alabama,1003.0,67496,719.0,Yes,67496,0.010652
2,1/1/2023,USA,Alabama,1005.0,7027,111.0,Yes,7027,0.015796
3,1/1/2023,USA,Alabama,1007.0,7692,108.0,Yes,7692,0.014041
4,1/1/2023,USA,Alabama,1009.0,17731,260.0,Yes,17731,0.014664
...,...,...,...,...,...,...,...,...,...
267004,3/23/2023,USA,Wyoming,56037.0,12519,139.0,Yes,12519,0.011103
267005,3/23/2023,USA,Wyoming,56039.0,12150,16.0,Yes,12150,0.001317
267006,3/23/2023,USA,Wyoming,56041.0,6416,43.0,Yes,6416,0.006702
267007,3/23/2023,USA,Wyoming,56043.0,2700,51.0,Yes,2700,0.018889


## **4️⃣ Assign Values Conditionally with `.loc[]`**

Using `.loc[]` gives you **full control** to update **only specific rows**.

Example: Set **cases = 2000** where **state = Wyoming**:

In [None]:
df.loc[df["state"] == "Wyoming", "cases"] = 2000

In [None]:
df[df["state"] == "Wyoming"]

Unnamed: 0,date,county,state,fips,cases,deaths,high_cases,cases_copy,mortality_rate
3232,1/1/2023,USA,Wyoming,56001.0,2000,54.0,Yes,11416,0.004730
3233,1/1/2023,USA,Wyoming,56003.0,2000,64.0,Yes,3046,0.021011
3234,1/1/2023,USA,Wyoming,56005.0,2000,156.0,Yes,13666,0.011415
3235,1/1/2023,USA,Wyoming,56007.0,2000,55.0,Yes,4975,0.011055
3236,1/1/2023,USA,Wyoming,56009.0,2000,57.0,Yes,3640,0.015659
...,...,...,...,...,...,...,...,...,...
267004,3/23/2023,USA,Wyoming,56037.0,2000,139.0,Yes,12519,0.011103
267005,3/23/2023,USA,Wyoming,56039.0,2000,16.0,Yes,12150,0.001317
267006,3/23/2023,USA,Wyoming,56041.0,2000,43.0,Yes,6416,0.006702
267007,3/23/2023,USA,Wyoming,56043.0,2000,51.0,Yes,2700,0.018889


## **5️⃣ Assign Using `.assign()` Method (Chaining)**

Pandas provides a clean way to **create new columns** without modifying the original DataFrame directly:

In [None]:
df2 = df.assign(mortality_rate = df["deaths"] / df["cases"])

In [None]:
df2

Unnamed: 0,date,county,state,fips,cases,deaths,high_cases,cases_copy,mortality_rate
0,1/1/2023,USA,Alabama,1001.0,18961,230.0,Yes,18961,0.012130
1,1/1/2023,USA,Alabama,1003.0,67496,719.0,Yes,67496,0.010652
2,1/1/2023,USA,Alabama,1005.0,7027,111.0,Yes,7027,0.015796
3,1/1/2023,USA,Alabama,1007.0,7692,108.0,Yes,7692,0.014041
4,1/1/2023,USA,Alabama,1009.0,17731,260.0,Yes,17731,0.014664
...,...,...,...,...,...,...,...,...,...
267004,3/23/2023,USA,Wyoming,56037.0,2000,139.0,Yes,12519,0.069500
267005,3/23/2023,USA,Wyoming,56039.0,2000,16.0,Yes,12150,0.008000
267006,3/23/2023,USA,Wyoming,56041.0,2000,43.0,Yes,6416,0.021500
267007,3/23/2023,USA,Wyoming,56043.0,2000,51.0,Yes,2700,0.025500


## **6️⃣ Assign Multiple Columns at Once**

You can update **two or more columns** in one line:

In [None]:
df[["cases", "deaths"]] = df[["cases", "deaths"]] * 2

In [None]:
df

Unnamed: 0,date,county,state,fips,cases,deaths,high_cases,cases_copy,mortality_rate
0,1/1/2023,USA,Alabama,1001.0,37922,460.0,Yes,18961,0.012130
1,1/1/2023,USA,Alabama,1003.0,134992,1438.0,Yes,67496,0.010652
2,1/1/2023,USA,Alabama,1005.0,14054,222.0,Yes,7027,0.015796
3,1/1/2023,USA,Alabama,1007.0,15384,216.0,Yes,7692,0.014041
4,1/1/2023,USA,Alabama,1009.0,35462,520.0,Yes,17731,0.014664
...,...,...,...,...,...,...,...,...,...
267004,3/23/2023,USA,Wyoming,56037.0,4000,278.0,Yes,12519,0.011103
267005,3/23/2023,USA,Wyoming,56039.0,4000,32.0,Yes,12150,0.001317
267006,3/23/2023,USA,Wyoming,56041.0,4000,86.0,Yes,6416,0.006702
267007,3/23/2023,USA,Wyoming,56043.0,4000,102.0,Yes,2700,0.018889


## **7️⃣ Summary Table — Assigning Data in Pandas**

| **Task**                         | **Code Example**                                    |
| -------------------------------- | --------------------------------------------------- |
| Add a constant column            | `df["country"] = "USA"`                             |
| Add column based on condition    | `df["high_cases"] = df["cases"] > 1500`             |
| Copy column                      | `df["cases_copy"] = df["cases"]`                    |
| Create new calculated column     | `df["mortality_rate"] = df["deaths"] / df["cases"]` |
| Assign conditionally with `.loc` | `df.loc[df["state"]=="Wyoming","cases"]=2000`       |
| Use `np.where` for Yes/No        | `df["high"]=np.where(df["cases"]>1500,"Yes","No")`  |
| Assign multiple columns          | `df[["cases","deaths"]]*=2`                         |
| Use `.assign()` method           | `df2=df.assign(rate=df["deaths"]/df["cases"])`      |




<div style="background-color:#28a745; padding:15px; border-radius:10px; text-align:center; display:flex; justify-content:center; align-items:center; gap:12px;">
    <img src="attachment:7e8693ed-5409-4d7d-ada6-b72d912289f6.jpg" alt="Evergreen Logo" style="width:70px; height:70px; border-radius:50%;">
    <h1 style="color:white; font-size:40px; font-weight:bold; margin:0;">
        Evergreen.ai
    </h1>

<div style="margin-left:auto; padding-right:10px;">
        <h2 style="color:white; font-size:30px; font-weight:600; margin:0;">
            Eng / Mahmoud Talaat ->
            Thanks
        </h2>
        <h2 style="color:white; font-size:30px; font-weight:600; margin:0;">
            01146544662
        </h2>
</div>
</div>
