# Data Wrangling & File Handlink in Various Formats in Python

Reading/ Writing Text, TSV, Json files in python

### Introduction to File Formats

1. Text Files (.txt): 
    - Plain text without structured formatting.
    - Used for simple data storage (eg - logs, notes).
    - Handled via Python's built-in open() function.

#### Writing to text file using built-in open() function

In [1]:
with open('example_test.txt', 'w') as file:
    file.write('This is the sample file for text.\n')
print('file created successfully')

file created successfully


#### Key Concepts:

1. open() -  Opens a file in write('w') or read('r') or other modes.
2. write() - Writes data to the file.
3. with statement: Ensures proper file closure after operations.


In [None]:
# writing multiple lines to a file
with open('example_test.txt', 'a') as file: # here 'a' is for append mode
    lines = ['First line of text.\n', 'Second line of text.\n', 'Third line of text.\n']
    file.writelines(lines)


In [13]:
#read from created file 
with open('example_test.txt', 'r') as file:
    content = file.read()
    print(content)

This is line number 1
This is line number 2
This is line number 3
This is line number 4
This is line number 5
This is line number 6
First line of text.
Second line of text.
Third line of text.



In [None]:
with open('example_test.txt', 'w') as file: # 'w' mode to overwrite the file
    for i in range(6):
        file.write(f'This is line number {i+1}\n') # f string for formatted text


In [16]:
#read from created file 
with open('example_test.txt', 'r') as file:
    content = file.read()
    print(content)

# for read if file doesnot exist it will throw error
# but for write it will create a new file

This is line number 1
This is line number 2
This is line number 3
This is line number 4
This is line number 5
This is line number 6



#### Example of writing multiple lines 

In [22]:
lines = ['First line of text.',
         'Second line of text.',
         'Third line of text']
with open('example_test.txt', 'a') as file: # here 'a' is for append mode
    for line in lines:
        file.write(line + '\n')

# \n is used to add a new line after each line while writing

In [21]:
with open('example_test.txt', 'r') as file:
    content = file.read()
    print(content)

# writing multiple lines to a file from a list using a loop

This is line number 1
This is line number 2
This is line number 3
This is line number 4
This is line number 5
This is line number 6
First line of text.
Second line of text.
Third line of text.
First line of text.
Second line of text.
Third line of text



#### Python code to read txt file from specific location

In [24]:
with open('example_test.txt', 'r') as file:
    lines = file.readlines()

for line in lines:
    print(line.strip())  # Using strip() to remove extra newlines or blank spaces

This is line number 1
This is line number 2
This is line number 3
This is line number 4
This is line number 5
This is line number 6
First line of text.
Second line of text.
Third line of text.
First line of text.
Second line of text.
Third line of text
First line of text.
Second line of text.
Third line of text


#### TSV Tab-Seperated Values :
1. Stored tabular data with tabs as delimiters.
2. Similar to CSV but uses tabs instead of commas.
3. Ideal for datasets with fields containing commas.

In [25]:
# Writing the dataframe to a TSV File
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

In [26]:
data

{'Name': ['Alice', 'Bob', 'Charlie'],
 'Age': [25, 30, 35],
 'City': ['New York', 'Los Angeles', 'Chicago']}

In [27]:
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago


In [28]:
df.to_csv('people.tsv', sep='\t', index=False)

In [29]:
#read the file to verify
df_read = pd.read_csv('people.tsv', sep='\t')   
df_read

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago


#### Key Concepts 
- sep='\t': Specifies tab as the delimiter.
- index = False: Avoids writing indices (by default)
- index = True; if your dataframe has a custom index (eg - dates, employee IDs), that you want to preserve.


#### When to use which?
- Use CSV when:
    - working with Excel/ Google sheets (default format).
    - Data is simple (no commas in text).
- Use TSV when:
    - Data contains commas, quotes, or molti-line text
    - Procession logs, NLP data, or large datasets.

## JSON ( JavaScript Object Notations):
- lightweight data interchange format
- Uses key-value pairs and nested structures.
- Commonly used for APIs and configuration files.

In [30]:
# Creating a json with multiple lines
import json

data = [
    {"Name": "Alice", "Age": 25, "City": "New York"},
    {"Name": "Bob", "Age": 30, "City": "Los Angeles"},
    {"Name": "Charlie", "Age": 35, "City": "Chicago"}       
]

#writing to a json file
with open('people.json', 'w') as file:
    for entry in data:
        json.dump(entry, file)
        file.write('\n')  # write each JSON object on a new line

In [32]:
#Json functions
#- data : This is directionary or list of dictionaries to be written to the file.
#- json.dump() : This function serializes a Python object and writes it to a file.
#reading from a json file
# file: The file object opened in read mode.
#indented = 4:
#indented : This parameter specifies the number of spaces to use for indentation in the output JSON file, making it more readable.
with open('people.json', 'r') as file:
    for line in file:
        entry = json.loads(line)  # parse each line as a JSON object
        print(entry)

{'Name': 'Alice', 'Age': 25, 'City': 'New York'}
{'Name': 'Bob', 'Age': 30, 'City': 'Los Angeles'}
{'Name': 'Charlie', 'Age': 35, 'City': 'Chicago'}


#

In [38]:
# reads each line from the file and uses json.loads() to parse it back into a Python dictionary.
for em in data:
    print('Name:', em['Name'], ', Age:', em['Age'], ', City:', em['City'])


Name: Alice , Age: 25 , City: New York
Name: Bob , Age: 30 , City: Los Angeles
Name: Charlie , Age: 35 , City: Chicago


Key Concepts:
- json.loads() - Parses JSON into Python dictionary

- JSON - ideal for APIs, configs.
- TXT  - best for logs, notes.
- CSV  - best for Excel data.
- TSV  - best for Test with commas.