<a href="https://colab.research.google.com/github/Harsh-Patel25/Python/blob/main/daily_lessons/Day_10_pandas_part_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial 10 – Python Pandas Tutorials – Part 2

Is lesson me hum cover karenge:  
- **StringIO** for in‑memory text streams  
- **Pandas read_csv** for reading CSV files (files, in‑memory strings, URLs, etc.)  
- **Reading different data sources** using options like `usecols`, `index_col`, aur data types set karne ke liye `dtype`  
- **Exporting data** with `to_csv`

Chaliye, step‑by‑step examples aur in‑depth explanations ke sath shuru karte hain! 😎🚀

---

## 1. Using StringIO for In‑Memory File Objects 📄✨

**Concept:**  
`StringIO` ek aisa tool hai jisse aap ek string ko file jaisa behavior de sakte ho. Yeh testing ke liye bohot useful hai, especially jab aap CSV data bina real file ke test karna chahte ho.

### Basic Example

```python
from io import StringIO
import pandas as pd

# CSV content ko ek string ke roop me define karo
data = ('col1,col2,col3\n'
        'x,y,1\n'
        'a,b,2\n'
        'c,d,3')

# String ko file-like object me convert karo
file_like = StringIO(data)
print(file_like)  # Output: <_io.StringIO object at ...>

# Ab is file-like object se CSV read karke DataFrame banao
df = pd.read_csv(file_like)
print(df.head())
```

**Explanation:**
1. **Import StringIO:**  
   `from io import StringIO` se aap StringIO class ko import karte ho, jo string ko file jaisa bana deta hai.
2. **CSV String Creation:**  
   `data` variable me ek multi-line string hai jo CSV content simulate karta hai.
3. **Convert to File-like Object:**  
   `StringIO(data)` se string ko aise object me convert karte hain jise pandas file samajh sake.
4. **Read CSV:**  
   `pd.read_csv(file_like)` is file-like object se CSV data read karta hai aur DataFrame banata hai.

---

### Extra Example 1: Read CSV and Select Specific Columns

```python
file_like = StringIO(data)  # naya file-like object banate hain
df_subset = pd.read_csv(file_like, usecols=['col1', 'col2'])
print("Selected Columns:\n", df_subset)
```

**Explanation:**
- **`usecols=['col1', 'col2']`** parameter se aap specify kar rahe ho ke sirf "col1" aur "col2" columns chahiye.  
- Yeh approach aapko memory save karne me madad karta hai jab aapko sirf specific data chahiye.

---

### Extra Example 2: Handling Different Delimiters

```python
data_tab = "a\tb\tc\n1\t2\t3\n4\t5\t6"
file_tab = StringIO(data_tab)
df_tab = pd.read_csv(file_tab, sep='\t')
print("Tab-separated DataFrame:\n", df_tab)
```

**Explanation:**
- **Tab-Separated Data:**  
  Yahan `\t` ka use hua hai jisse columns tab se separate hote hain.
- **`sep='\t'`:**  
  Pandas ko batata hai ke delimiter tab hai, taaki woh sahi tarah se data ko columns me divide kar sake.

---

### Extra Example 3: Resetting the StringIO Pointer

```python
file_like.seek(0)  # Pointer ko starting position par reset karo
df_again = pd.read_csv(file_like)
print("Re-read DataFrame:\n", df_again)
```

**Explanation:**
- **`seek(0)`:**  
  Jab aap file-like object se data read karte ho, pointer end tak chala jata hai. `seek(0)` se pointer ko beginning me laate hain taaki aap phir se pura data read kar sako.

---

### Extra Example 4: Reading the Raw Content

```python
file_like.seek(0)
content = file_like.read()
print("Content of file-like object:\n", content)
```

**Explanation:**
- **`file_like.read()`:**  
  Ye poora CSV string content read karta hai.
- **Usage:**  
  Useful hai jab aap verify karna chahte ho ke file-like object me exactly kya content stored hai.

---

## 2. Pandas read_csv: Reading CSV Files 📥📋

**Basic Usage:**  
Agar aapke paas ek real CSV file hai (jaise 'mercedesbenz.csv'), to aap ise asaani se DataFrame me read kar sakte hain.

### Basic Example

```python
df = pd.read_csv('mercedesbenz.csv')
print(df.head())
```

**Explanation:**
- **File Reading:**  
  `pd.read_csv('mercedesbenz.csv')` file ko read karta hai aur DataFrame banata hai.
- **`.head()`:**  
  First 5 rows print karta hai, jisse aap quickly data ko dekh sakte ho.

> **Note:** Agar aapke paas yeh file nahi hai, to aap StringIO se in-memory CSV bana sakte ho (jaise upar diye gaye examples me).

---

### Extra Example 1: Reading with Specific Columns

```python
df_subset = pd.read_csv('mercedesbenz.csv', usecols=['X0', 'X1', 'X2', 'X3', 'X4', 'X5'])
print("Subset DataFrame:\n", df_subset.head())
```

**Explanation:**
- **`usecols`:**  
  Sirf specified columns read karega. Yeh tab useful hai jab aapke CSV me bohot saare columns hain aur aapko sirf kuch hi chahiye.

---

### Extra Example 2: Writing DataFrame to CSV without Index

```python
df.to_csv('test.csv', index=False)
print("DataFrame saved to test.csv without index!")
```

**Explanation:**
- **`to_csv()`:**  
  DataFrame ko CSV file me export karta hai.
- **`index=False`:**  
  Index column file me include nahi hota.

---

### Extra Example 3: Reading CSV from a URL

```python
url = 'https://download.bls.gov/pub/time.series/cu/cu.item'
df_url = pd.read_csv(url, sep='\t')
print("DataFrame from URL:\n", df_url.head())
```

**Explanation:**
- **Reading from URL:**  
  Aap directly online CSV file ko read kar sakte ho.
- **`sep='\t'`:**  
  URL se aane wale data me tab delimiter use hua ho, isliye specify karna zaruri hai.

---

## 3. Data Types and Converting Data with read_csv 🎛️🔍

**Concept:**  
Jab aap CSV read karte ho, Pandas automatically data types infer karta hai. Par aap `dtype` parameter use karke manual control bhi de sakte ho.

### Example without Specifying dtype

```python
data = ('a,b,c,d\n'
        '1,2,3,4\n'
        '5,6,7,8\n'
        '9,10,11,12')
df = pd.read_csv(StringIO(data))
print(df.info())
```

**Explanation:**
- **Inferred Data Types:**  
  Pandas numeric columns ko automatically int ya float banata hai.
- **`info()`:**  
  DataFrame ke structure aur dtypes dikhata hai.

---

### Example with dtype='object'

```python
df_obj = pd.read_csv(StringIO(data), dtype='object')
print(df_obj.info())
```

**Explanation:**
- **`dtype='object'`:**  
  Sabhi columns ko strings (object) me read karta hai. Useful hai agar aap data ko later processing ke liye categorical treat karna chahte ho.

---

### Extra Example 1: Specifying Different dtypes for Different Columns

```python
df_custom = pd.read_csv(StringIO(data), dtype={'a': int, 'b': float, 'c': int, 'd': int})
print("Custom dtypes:\n", df_custom.dtypes)
```

**Explanation:**
- **`dtype={'a': int, 'b': float, ...}`:**  
  Har column ke liye specific data type define karta hai. Yeh ensure karta hai ke conversion sahi ho aur calculations me problem na aaye.

---

### Extra Example 2: Using pd.to_numeric

```python
df_convert = pd.read_csv(StringIO(data))
df_convert['d'] = pd.to_numeric(df_convert['d'], errors='coerce')
print("Converted column 'd' types:\n", df_convert.dtypes)
```

**Explanation:**
- **`pd.to_numeric()`:**  
  Kisi column ko numeric me convert karta hai.  
- **`errors='coerce'`:**  
  Invalid parsing ko NaN me convert kar deta hai, taaki error na aaye.

---

### Extra Example 3: Checking Missing Values

```python
print("Missing values count:\n", df_convert.isnull().sum())
```

**Explanation:**
- **`isnull().sum()`:**  
  Har column me kitne missing (NaN) values hain, woh count karta hai.

---

### Extra Example 4: Reading CSV with Mixed Data

```python
data_mixed = ('a,b,c,d\n'
              '1,2,three,4\n'
              '5,6,7,8\n'
              '9,10,11,NaN')
df_mixed = pd.read_csv(StringIO(data_mixed), dtype={'a': int, 'b': int, 'c': str, 'd': float})
print("DataFrame with mixed types:\n", df_mixed)
```

**Explanation:**
- **Mixed Data Handling:**  
  Kuch columns numeric, kuch string – `dtype` se aap explicitly define karte ho ke kaunsa column kis type ka hai.

---

## 4. Indexing and Use of index_col and usecols 🏷️✅

**Using index_col:**  
Aap kisi column ko DataFrame ka index bana sakte ho, jo aapke data ko uniquely identify karta hai.

### Basic Example

```python
data = ('index,a,b,c\n'
        '4,apple,bat,5.7\n'
        '8,orange,cow,10')
df_indexed = pd.read_csv(StringIO(data), index_col=0)
print("DataFrame with custom index:\n", df_indexed)
```

**Explanation:**
- **`index_col=0`:**  
  Pehli column (jiska header "index" hai) ko index banata hai.

---

**Using usecols:**  
Sirf specific columns ko read karne ke liye.

```python
data2 = ('a,b,c\n'
         '4,apple,bat\n'
         '8,orange,cow')
df_usecols = pd.read_csv(StringIO(data2), usecols=['a', 'b', 'c'], index_col=0)
print("DataFrame with selected columns:\n", df_usecols)
```

**Explanation:**
- **`usecols=['a', 'b', 'c']`:**  
  Sirf inhi columns ko DataFrame me include karta hai.

---

### Extra Example 1: Both index_col and usecols

```python
data3 = ('index,a,b,c,d\n'
         '1,x,y,z,100\n'
         '2,a,b,c,200')
df_combo = pd.read_csv(StringIO(data3), index_col='index', usecols=['index', 'a', 'd'])
print("DataFrame with index and selected columns:\n", df_combo)
```

**Explanation:**
- **Combination:**  
  Yahan "index" column ko index banaya aur sirf "index", "a", aur "d" columns read kiye gaye.

---

### Extra Example 2: Multi-index

```python
data_multi = ('id,group,value\n'
              '1,A,10\n'
              '2,A,15\n'
              '3,B,20\n'
              '4,B,25')
df_multi = pd.read_csv(StringIO(data_multi), index_col=['group', 'id'])
print("Multi-index DataFrame:\n", df_multi)
```

**Explanation:**
- **Multi-index:**  
  Do columns ("group" aur "id") ko combined index bana dete hain. Yeh advanced indexing ke liye useful hai.

---

### Extra Example 3: Selecting Columns by Number

```python
df_number = pd.read_csv(StringIO(data3), usecols=[1, 3])  # 2nd and 4th columns
print("Selected by column index:\n", df_number)
```

**Explanation:**
- **`usecols=[1, 3]`:**  
  Numeric index se specify karta hai ke 2nd aur 4th column select karne hain.

---

### Extra Example 4: index_col with dtype Specifications

```python
df_combo_dtype = pd.read_csv(StringIO(data3), index_col='index', dtype={'d': int})
print("DataFrame with index and specified dtype:\n", df_combo_dtype)
```

**Explanation:**
- **Combining Parameters:**  
  Isse aap index set karne ke sath-saath specific columns ke data type bhi define kar sakte ho.

---

## 5. Exporting Data: to_csv() 📤📝

**Concept:**  
`to_csv()` function se aap DataFrame ko CSV file me export kar sakte ho. Bahut saare options available hain, jaise custom separator, headers, aur append mode.

### Basic Example

```python
# Assuming 'df' is already created from previous examples
df.to_csv('test.csv', index=False)
print("DataFrame saved to 'test.csv' without index!")
```

**Explanation:**
- **`index=False`:**  
  File me index include nahi hota, jo reading ke time unnecessary ho sakta hai.

---

### Extra Example 1: Saving Only Selected Columns

```python
df[['Column1', 'Column2']].to_csv('subset.csv', index=False)
print("Subset DataFrame saved to 'subset.csv'!")
```

**Explanation:**
- **Subset of Columns:**  
  Sirf selected columns ko export karta hai.

---

### Extra Example 2: Saving with a Custom Separator

```python
df.to_csv('test_semicolon.csv', sep=';', index=False)
print("DataFrame saved with semicolon delimiter to 'test_semicolon.csv'!")
```

**Explanation:**
- **`sep=';'`:**  
  CSV file me columns ko separate karne ke liye semicolon use hota hai instead of comma.

---

### Extra Example 3: Exporting with Renamed Headers

```python
df.rename(columns={'Column1': 'C1', 'Column2': 'C2'}).to_csv('renamed.csv', index=False)
print("DataFrame saved with renamed columns to 'renamed.csv'!")
```

**Explanation:**
- **Renaming Columns:**  
  DataFrame ke headers ko rename karke export karta hai, useful for better clarity.

---

### Extra Example 4: Appending Data to an Existing CSV

```python
df.to_csv('test_append.csv', mode='a', header=False, index=False)
print("Data appended to 'test_append.csv'!")
```

**Explanation:**
- **`mode='a'`:**  
  Append mode me data add karta hai bina purana data overwrite kiye.
- **`header=False`:**  
  Naye rows add karte waqt headers dobara nahi likhte.

---

## Summary & Key Points 🔑💡

- **StringIO:**  
  - **Usage:** In-memory file operations ke liye, bina real file ke CSV testing.
  - **Key Methods:** `seek(0)` to reset pointer, `.read()` to get content.

- **pd.read_csv():**  
  - **Usage:** CSV files ko DataFrame me convert karne ka primary function.
  - **Parameters:** `usecols`, `index_col`, `sep`, `dtype` – in sab se data ko customize padh sakte ho.

- **Data Types:**  
  - Pandas automatically data types infer karta hai, par aap `dtype` parameter se control kar sakte ho.
  - Conversions ke liye `pd.to_numeric` useful hai.

- **Indexing & Subsetting:**  
  - `index_col` aur `usecols` se DataFrame ko customize karke efficient banaya ja sakta hai.
  - Multi-indexing advanced data organization ke liye hoti hai.

- **Exporting Data:**  
  - `to_csv()` se DataFrame ko CSV file me save kar sakte ho.
  - Options jese custom separators, header renaming, aur append mode se file export flexible banta hai.

---

