# Assignment 3: Chapters 3.3, 6.1-6.2 File formats and I/O 

For this assignment, we will be covering topics from lectures 5 and 6: File formats and I/O.

Complete the code segments according to instructions and comment or answer questions using Markdown

## Topics Covered

-  Built in File I/O
-  Open, close, flush operations
-  Read, write, append
-  Text and binary formats
-  Encoding format
-  Pickle files
-  Pandas I/O
-  Dataframes
-  Missing Values
-  Common file formats
   - csv
   - json
   - hdf
   - xml
- Web scraping
   - html
   - xml
- API interaction
- Database interaction

## Question 1

Write a Python program to create a text file named <b>info.txt</b> and write the following lines into it:

```Python
Name: Alice
Age: 25
City: New York
```

Make sure your file is properly closed after writing

In [149]:
## Solution
```Python
data = {"Name": "Alice", "Age": 25, "City": "New York"}
with open("info.txt", encoding="utf-8", mode="w") as file:
    for key, value in data.items():
        file.write(f"{key}: {value}\n")
```

## Question 2

What are the advantages of using Pandas for IO operations compared to Python’s built-in file handling methods?

Are there any limitations to using pandas over the basic I/O functions?

## solution

|Feature|	Pandas IO|	Python Built-in IO|
|---|---|---|
|Ease of Use|	High-level, intuitive methods|	Requires manual parsing and handling|
|Structured Data|	Automatically organized into DataFrame|	Raw strings/lists|
|Missing Data Handling|	Built-in tools like fillna()|	Requires manual handling|
|File Format Support|	Wide range (CSV, Excel, JSON, SQL, etc.)|	Limited to text and binary files|
|Performance for Small Tasks|	Slightly slower for small files|	Faster for simple tasks|
|Custom Formats|	Limited support	|Full control over parsing|
|Real-Time Processing|	Not ideal|	Suitable for real-time operations|
|Dependency|	Requires installation of Pandas|	Built into Python|



## Question 3

Give an example of a use case for <b>pickle</b> files.

What are some disadvantages to using <b>pickle</b> files.


## solution

When to Use Pickle Files
- You need to save and load Python-specific objects (e.g., machine learning models, dictionaries, custom classes).
- You work in a Python-only environment.
- You trust the source of the pickle file.
  
When to Avoid Pickle Files
- Security is a concern (use safer formats like JSON or CSV).
- You need cross-language compatibility.
- You’re working with large datasets (use formats like HDF5 or Parquet).

## Question 4

#### Select Specific Columns While Reading

Write a Python program to read a CSV file named <b>data.csv</b> into a Pandas DataFrame. 

Select only the columns <b>Name</b> and <b>Age</b> while reading the file.

Hint: Use the usecols parameter in read_csv().

## solution
```Python
import pandas as pd
# Read the CSV file and select only the columns 'Name' and 'Age'
df = pd.read_csv("assignment_files/data.csv", usecols=["Name", "Age"])
print(df)
```

## Question 5

#### Handle Missing Values While Reading

Write a Python program to read a CSV file named <b>data_with_missing.csv</b> into a Pandas <b>DataFrame</b>. 

Replace all missing values (denoted by empty string) with the default <b>NaN</b> and display the updated DataFrame.

Try to do this within the <b>read_csv()</b> function.

## solution
```Python
import pandas as pd
# Read the CSV file into a Pandas DataFrame
df = pd.read_csv("assignment_files/data_with_missing.csv",na_values=[""])
print(df)
```