In [1]:
import pandas as pd         # type: ignore

In [2]:
df = pd.read_csv("../datasets/DataSet.csv")
df.head()

Unnamed: 0,CustomerID,First_Name,Last_Name,Paying Customer,Do_Not_Contact,Not_Useful_Column
0,1001,Frodo,Baggins,Yes,No,True
1,1002,Abed,Nadir,No,Yes,False
2,1003,Walter,/White,N,,True
3,1004,Dwight,Schrute,Yes,Y,True
4,1005,Jon,Snow,Y,No,True


> ### 👉 use **`to_string()`** to **print the entire DataFrame**.

In [4]:
print(df.to_string()) 

    CustomerID First_Name    Last_Name Paying Customer Do_Not_Contact  Not_Useful_Column
0         1001      Frodo      Baggins             Yes             No               True
1         1002       Abed        Nadir              No            Yes              False
2         1003     Walter       /White               N            NaN               True
3         1004     Dwight      Schrute             Yes              Y               True
4         1005        Jon         Snow               Y             No               True
5         1006        Ron      Swanson             Yes            Yes               True
6         1007       Jeff       Winger              No             No              False
7         1008   Sherlock       Holmes               N             No              False
8         1009    Gandalf          NaN             Yes            NaN              False
9         1010      Peter       Parker             Yes             No               True
10        1011    Sam

## 🔢 Controlling the Number of Displayed Rows in Pandas  

### 📌 **Pandas limits the number of displayed rows to improve readability.**  
To check the **maximum number of rows your system can display**, use:  

```python
pd.options.display.max_rows
```

🔍 If your dataset has **more rows than this limit**, Pandas will **truncate** the output, You can **increase or decrease** this limit by modifying the setting:  

```python
pd.options.display.max_rows = 100  # Set max displayed rows to 100
```

🚀 **Tip**: Adjust this setting **wisely**—higher values can slow down performance!  

In [6]:
pd.options.display.max_rows

60

In [8]:
pd.options.display.max_rows = 100
df

Unnamed: 0,CustomerID,First_Name,Last_Name,Paying Customer,Do_Not_Contact,Not_Useful_Column
0,1001,Frodo,Baggins,Yes,No,True
1,1002,Abed,Nadir,No,Yes,False
2,1003,Walter,/White,N,,True
3,1004,Dwight,Schrute,Yes,Y,True
4,1005,Jon,Snow,Y,No,True
5,1006,Ron,Swanson,Yes,Yes,True
6,1007,Jeff,Winger,No,No,False
7,1008,Sherlock,Holmes,N,No,False
8,1009,Gandalf,,Yes,,False
9,1010,Peter,Parker,Yes,No,True


<div style="background-color: #c90016 ; color: #ffffff; width: 100%; height: 50px; text-align: center; font-weight: bold; line-height: 50px; margin: 10px 0; font-size: 24px;">
📂 Saving Data in CSV & XLSX Files 
</div>

### 📝 **Save DataFrame as a CSV File**  
To save your DataFrame in a **CSV format**, use:  

```python
df.to_csv("filename.csv", index=False)  # Save without index

💡 **Tip**: Set `index=True` if you want to keep the index in the file.  
```

---
### 📊 **Save DataFrame as an Excel File**  
To save your DataFrame in **Excel format (.xlsx)**, use:  

```python
df.to_excel("filename.xlsx", index=False, sheet_name="Sheet1")
```

✅ **Note**: Make sure you have `openpyxl` installed to write Excel files. Install it with:  
```bash
pip install openpyxl
```

In [9]:
w = pd.Series({'a':1  ,'b':2  ,'c':3  ,'d':4  ,'e':5 })
x = pd.Series({'a':6  ,'b':7  ,'c':8  ,'d':9  ,'e':10})
y = pd.Series({'a':11 ,'b':12 ,'c':13 ,'d':14 ,'e':15})
z = pd.Series({'a':16 ,'b':17 ,'c':18 ,'d':19 ,'e':20})

grades = pd.DataFrame({'Math':w,'Physics':x,'French':y,'Chemistry':z})
grades

Unnamed: 0,Math,Physics,French,Chemistry
a,1,6,11,16
b,2,7,12,17
c,3,8,13,18
d,4,9,14,19
e,5,10,15,20


In [10]:
grades.to_csv("../datasets/DataSet2.csv")       # if i put the same dataset, it will overwrite the data

> ![image.png](attachment:image.png)

In [11]:
grades.to_excel("../datasets/DataSet3.xlsx")

> ![image.png](attachment:image.png)

In [12]:
grades.to_excel("../datasets/DataSet4.xlsx",sheet_name="it's optional")

> ![image.png](attachment:image.png)

In [13]:
grades1 = pd.read_csv('../datasets/DataSet2.csv')
grades1

Unnamed: 0.1,Unnamed: 0,Math,Physics,French,Chemistry
0,a,1,6,11,16
1,b,2,7,12,17
2,c,3,8,13,18
3,d,4,9,14,19
4,e,5,10,15,20


In [14]:
grades2 = pd.read_excel('../datasets/DataSet3.xlsx')
grades2

Unnamed: 0.1,Unnamed: 0,Math,Physics,French,Chemistry
0,a,1,6,11,16
1,b,2,7,12,17
2,c,3,8,13,18
3,d,4,9,14,19
4,e,5,10,15,20


<div style="background-color: #c90016 ; color: #ffffff; width: 100%; height: 50px; text-align: center; font-weight: bold; line-height: 50px; margin: 10px 0; font-size: 24px;">
Header
</div>

## 🏷️ Handling Headers in Pandas  

When reading a CSV file, the `header` parameter determines how column names are interpreted.  

| **Parameter**  | **Behavior**  | **Effect on DataFrame**  |
|---------------|--------------|--------------------------|
| `header=0`    | Default behavior | The first row is treated as column headers (not part of data). |
| `header=None` | No header row | All rows are treated as data, and default numeric column names (0,1,2...) are assigned. |

### 📝 **Example Usage**  

```python
# First row is used as column names
df = pd.read_csv("data.csv", header=0)  

# Treats all rows as data (column names will be auto-generated as 0,1,2,...)
df = pd.read_csv("data.csv", header=None)


In [15]:
data1 = pd.read_csv("../datasets/DataSet2.csv", header=0)    # default
data1.head()

Unnamed: 0.1,Unnamed: 0,Math,Physics,French,Chemistry
0,a,1,6,11,16
1,b,2,7,12,17
2,c,3,8,13,18
3,d,4,9,14,19
4,e,5,10,15,20


In [16]:
data1[:1]

Unnamed: 0.1,Unnamed: 0,Math,Physics,French,Chemistry
0,a,1,6,11,16


In [17]:
data2 = pd.read_csv("../datasets/DataSet2.csv", header=None) 
data2.head()

Unnamed: 0,0,1,2,3,4
0,,Math,Physics,French,Chemistry
1,a,1,6,11,16
2,b,2,7,12,17
3,c,3,8,13,18
4,d,4,9,14,19


In [18]:
data2[:1]

Unnamed: 0,0,1,2,3,4
0,,Math,Physics,French,Chemistry


<div style="background-color: #c90016 ; color: #ffffff; width: 100%; height: 50px; text-align: center; font-weight: bold; line-height: 50px; margin: 10px 0; font-size: 24px;">
🏷️ Naming Columns in Pandas
</div>

When reading a CSV file, you can manually assign column names using the `names` parameter in `pd.read_csv()`.  

### **📌 Example Usage**
```python
import pandas as pd  

# Define custom column names  
names = ['a', 'b', 'c', 'd', 'e']  

# Read the CSV file and apply custom column names  
data0 = pd.read_csv('DataSet2.csv', names=names)  

# Display the DataFrame  
print(data0)

In [19]:
data = pd.read_csv("../datasets/DataSet2.csv")  
data.head()

Unnamed: 0.1,Unnamed: 0,Math,Physics,French,Chemistry
0,a,1,6,11,16
1,b,2,7,12,17
2,c,3,8,13,18
3,d,4,9,14,19
4,e,5,10,15,20


In [21]:
names = ['a', 'b', 'c', 'd', 'e']
data0 = pd.read_csv('../datasets/DataSet2.csv', names=names)  
data0

Unnamed: 0,a,b,c,d,e
0,,Math,Physics,French,Chemistry
1,a,1,6,11,16
2,b,2,7,12,17
3,c,3,8,13,18
4,d,4,9,14,19
5,e,5,10,15,20


### **🤔 Why Did This Happen?**
- The CSV file **already had column headers**, but when we used `names=`, Pandas **did not remove the existing headers**; instead, it **treated them as a regular row** in the dataset.  
- The original column headers (e.g., `"Math", "Physics"`) are now part of the first row of data (index `0`), and our custom column names (`a, b, c, d, e`) replaced the actual headers.  
- The first column (`a`) contains `NaN` because there was **no corresponding value** in that position in the original file.

### **✅ Solution (Correct Usage)**

In [23]:
data0 = pd.read_csv('../datasets/DataSet2.csv', header=0, names=names)
data0

Unnamed: 0,a,b,c,d,e
0,a,1,6,11,16
1,b,2,7,12,17
2,c,3,8,13,18
3,d,4,9,14,19
4,e,5,10,15,20


<div style="background-color: #c90016 ; color: #ffffff; width: 100%; height: 50px; text-align: center; font-weight: bold; line-height: 50px; margin: 10px 0; font-size: 24px;">
head , tail
</div>


In [24]:
data1 ={'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada','Nevada','Nevada'],
        'year' : [2000, 2001, 2002, 2001, 2002, 2003,2004,2005],
        'pop'  : [1.5, 1.7, 3.6, 2.4, 2.9, 3.2,4.3,2.5]}

frame1 = pd.DataFrame(data1)
frame1

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9
5,Nevada,2003,3.2
6,Nevada,2004,4.3
7,Nevada,2005,2.5


In [25]:
frame1.head()

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9


In [26]:
frame1.tail()

Unnamed: 0,state,year,pop
3,Nevada,2001,2.4
4,Nevada,2002,2.9
5,Nevada,2003,3.2
6,Nevada,2004,4.3
7,Nevada,2005,2.5
