## Pandas : Series and Data Frames

In [7]:
import pandas as pd

In [None]:
# Series
values = [10,20,30,40,50]
s = pd.Series(values)
print("Series:\n", s)

s1 = pd.Series(values, index=['a','b','c','d','a'])
print("Series with custom index:\n", s1)
print("Values at index:\n", s1.loc['a'])

# Data Frame
df = pd.DataFrame({
    'name': ['Mike', 'Bob', 'Alice'],
    'age': [44, 39, 29],
    'job': ['Architect', 'Engineer', 'Developer']
})
print("Data Frames:\n", df)

df1 = df.set_index('name')
print("Data Frames with custom index:\n", df1)

In [None]:
df2 = pd.DataFrame({
    'a': [1,2,3]
})

df3 = pd.DataFrame({
    'a': [4,5,6]
})
print("adding data frames:\n", df2 + df3)

In [None]:
# Import and Export Data

# reset index to default
# df1 = df1.reset_index()

# store data in csv file
df1.to_csv('mydata.csv')

# read csv file data
pd.read_csv('mydata.csv')


In [None]:
# to json data
df1.to_json('mydata.json')

df1.to_dict()

**Pandas** is a powerful **data analysis and manipulation library** in Python. It provides easy-to-use **data structures** and functions to work with **structured data**, especially **tabular data** (like Excel spreadsheets or SQL tables).

---

### 🔹 Key Features:

* **`DataFrame`**: 2D table-like data structure (rows and columns).
* **`Series`**: 1D labeled array (like a single column).
* **Data cleaning**: Handle missing data, filter, sort, group, merge, etc.
* **Data loading**: Read/write from CSV, Excel, SQL, JSON, and more.

---

### 🔹 Example:

```python
import pandas as pd

data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)

print(df)
```

**Output:**

```
    Name  Age
0  Alice   25
1    Bob   30
```

---

### ✅ Why Use Pandas?

* Makes **data analysis** faster and easier.
* Widely used in **data science**, **machine learning**, and **finance**.

### Series and DataFrames
---

### ✅ **1. Series**

* A **Series** is like a **single column** in a DataFrame.
* Internally, it’s a **1D labeled array** (similar to a dictionary).
* Created from a list, with default index `0, 1, 2...` or custom index.

```python
import pandas as pd
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
```

* Index can have **duplicate values**.
* You can access elements using `.loc[index]`.

```python
s.loc['a']  # Returns value(s) for index 'a'
```

---

### ✅ **2. DataFrame**

* A **DataFrame** is a **2D labeled table** (like an Excel sheet or SQL table).
* Think of it as a **collection of Series** (columns).

```python
df = pd.DataFrame({
    "name": ["Mike", "Bob", "Alice"],
    "age": [30, 80, 45],
    "job": ["Programmer", "Clerk", "Designer"]
})
```

* Has default row indices: `0, 1, 2...`

---

### ✅ **3. Setting the Index**

* You can **change the index** to a specific column:

```python
df = df.set_index("name")  # Sets "name" column as index
```

* Always **assign back** to `df` (i.e. `df = df.set_index(...)`)

  > Otherwise, the change is not saved.

* ⚠️ Avoid using `inplace=True` — it works but is **bad practice**.

---

### ✅ **4. Indexing with `.loc`**

* Use `.loc[index_label]` to access rows by index label:

```python
df.loc["Mike"]  # Accesses the row where name == "Mike"
```

---

### ✅ **5. Index Alignment in Operations**

* When performing **arithmetic operations** on DataFrames, Pandas aligns **by index**, not by position.

```python
df1 = pd.DataFrame({"A": [1, 2, 3]}, index=[0, 1, 2])
df2 = pd.DataFrame({"A": [10, 20, 30]}, index=[1, 2, 0])

result = df1 + df2
```

* Result aligns rows by index values:

```
   A
0  31   # 1 + 30
1  22   # 2 + 20
2  13   # 3 + 10
```

---

### ✅ **6. DataFrame Operations Summary**

You can do many operations with DataFrames:

* `filter`, `query`, `iterate`, `sort`, `group`, `merge`, `concatenate`, `join`, etc.

---

### 🧠 Summary Pointers:

| Concept           | Description                                                  |
| ----------------- | ------------------------------------------------------------ |
| `Series`          | 1D labeled array (like a dict); single column in DataFrame   |
| `DataFrame`       | 2D labeled data structure; collection of Series              |
| `Index`           | Labels for rows (default: 0,1,2,...); can be customized      |
| `.loc[]`          | Access rows by index label                                   |
| `set_index()`     | Set a specific column as index (reassign to make persistent) |
| `Index Alignment` | Arithmetic operations align on index, not on row order       |
| `Avoid inplace`   | Prefer assigning result to a variable over `inplace=True`    |

---


### Import & Export Data : **Pandas Export & Import Concepts Summary** (12:39)

---

### ✅ **1. Resetting the Index**

* Before exporting, you often want to **reset the index** so it becomes a regular column (instead of being used as a row label).

```python
df = df.reset_index()
```

---

### ✅ **2. Exporting a DataFrame**

You can export a DataFrame to many formats using `.to_<format>()`.

#### Common Export Formats and Methods:

| Format  | Method                          | Notes                                |
| ------- | ------------------------------- | ------------------------------------ |
| CSV     | `df.to_csv("filename.csv")`     | Most commonly used                   |
| JSON    | `df.to_json("filename.json")`   | Exports as a dictionary-like JSON    |
| Excel   | `df.to_excel("filename.xlsx")`  | Requires `openpyxl` or `xlsxwriter`  |
| HTML    | `df.to_html("filename.html")`   | Renders as `<table>` in HTML         |
| Dict    | `df.to_dict()`                  | Converts to a Python dictionary      |
| Parquet | `df.to_parquet("file.parquet")` | Good for large, binary tabular data  |
| Pickle  | `df.to_pickle("file.pkl")`      | Serializes DataFrame (Python-native) |
| SQL     | `df.to_sql(...)`                | Used with databases                  |
| XML     | `df.to_xml("filename.xml")`     | Exports as XML                       |

#### 👉 **Avoid Unwanted Index in Export**

Use `index=False` to prevent exporting the index as a column:

```python
df.to_csv("filename.csv", index=False)
```

---

### ✅ **3. Viewing Exported CSV in Terminal**

* You can verify the export by running:

```bash
cat filename.csv
```

* Without `index=False`, you’ll see an unnamed column added for the index.

---

### ✅ **4. Importing a DataFrame**

Use `pd.read_<format>()` to import.

#### Common Import Methods:

| Format  | Method                            | Notes                                  |
| ------- | --------------------------------- | -------------------------------------- |
| CSV     | `pd.read_csv("filename.csv")`     | Default index may show as “Unnamed: 0” |
| JSON    | `pd.read_json("filename.json")`   | Can load structured records            |
| Excel   | `pd.read_excel("filename.xlsx")`  | Can read specific sheets               |
| HTML    | `pd.read_html("filename.html")`   | Returns list of tables                 |
| Parquet | `pd.read_parquet("file.parquet")` | Fast and efficient                     |
| Pickle  | `pd.read_pickle("file.pkl")`      | Use only with trusted sources          |
| SQL     | `pd.read_sql(...)`                | Read from SQL databases                |
| XML     | `pd.read_xml("filename.xml")`     | Parses XML structure                   |

---

### ✅ **5. Handling Index Column on Import**

* If the index column was unintentionally exported:

```python
df = pd.read_csv("filename.csv", index_col=0)  # Treat first column as index
```

* If index was **not** exported:

```python
df = pd.read_csv("filename.csv")  # No need for index_col
```

---

### ✅ **6. JSON Export Format**

* Index is used as the **key** in exported JSON.

```json
{
  "0": {"name": "Mike", "age": 30, "job": "Programmer"},
  "1": {"name": "Bob", "age": 80, "job": "Clerk"},
  ...
}
```

---

### ✅ **7. `.to_dict()`**

* Converts DataFrame to a native Python dictionary.

```python
df.to_dict()
```

* Often resembles the JSON structure but is a Python object.

---

### ✅ **8. HTML Export**

* Converts DataFrame into a valid HTML table.

```python
df.to_html("filename.html")
```

* Useful for embedding in web pages.

---

## 🧠 Final Tips

| Tip                                                                                                   | Description |
| ----------------------------------------------------------------------------------------------------- | ----------- |
| Use `index=False` when exporting if you don’t want the index in the file.                             |             |
| Always **inspect your export** (e.g. using `cat` or opening in Excel) to ensure structure is correct. |             |
| Choose **format based on your use-case**:                                                             |             |

* CSV for simple tabular data
* JSON for structured data
* Pickle/Parquet for speed
* HTML/XML for web/data exchange |

