# Introduction to CSV and Pandas
This notebook shows how to read a CSV file in two ways:
1. **Using Python's built-in `csv` module**
2. **Using the `pandas` library**
We'll also see how to examine the data once it's loaded.

In [2]:
# Reading data using Python's built-in csv module
import csv

with open('students.csv', 'r') as f:
    reader = csv.DictReader(f) # csv.DictReader returns an iterator that produces dictionaries
    data_csv = list(reader)

print('First record using csv module:', data_csv[0])

First record using csv module: {'Name': 'Alice', 'Age': '14', 'Grade': 'A'}


## Reading Data with Pandas
`pandas` makes it easier to work with tabular data and offers many helpful methods.

In [None]:
# Reading data using pandas

import pandas as pd
import os
df = pd.read_csv("students.csv")
print('First 5 records using pandas:')
df.head(10)

First 5 records using pandas:


Unnamed: 0,Name,Age,Grade
0,Alice,14,A
1,Bob,15,B
2,Charlie,14,C
3,David,15,A
4,Eva,14,B


In [4]:
# Reading a specfic column
df['Name'] 

0      Alice
1        Bob
2    Charlie
3      David
4        Eva
Name: Name, dtype: object

In [5]:
# Accessing a specific row
df.iloc[2] 

Name     Charlie
Age           14
Grade          C
Name: 2, dtype: object

In [None]:
alice = df[df['Name'] == 'Alice']
alice

Unnamed: 0,Name,Age,Grade
0,Alice,14,A


## Olympics dataset from Kaggle has inconsistent lines

Here is an example of inconsistent lines in the dataset:

- **Normal Lines**:
    ```
    M,110M Hurdles Men,Rio,2016,S,Orlando ORTEGA,ESP,13.17
    M,110M Hurdles Men,Rio,2016,B,Dimitri BASCOU,FRA,13.24
    ```

- **Inconsistent Lines**:
    ```
    - M,110M Hurdles Men,Beijing,2008,G,Dayron ROBLES,CUB,12.93,+0.1
    - M,110M Hurdles Men,Beijing,2008,S,David PAYNE,USA,13.17,+0.1
    - M,110M Hurdles Men,Beijing,2008,B,David OLIVER,USA,13.18,+0.1
    ```

The inconsistent lines have an extra field at the end (e.g., `+0.1`), which makes them differ from the standard format.

In [7]:
olympics_df = pd.read_csv("results.csv", on_bad_lines="skip")
olympics_df.head()

Unnamed: 0,Gender,Event,Location,Year,Medal,Name,Nationality,Result
0,M,10000M Men,Rio,2016,G,Mohamed FARAH,USA,25:05.17
1,M,10000M Men,Rio,2016,S,Paul Kipngetich TANUI,KEN,27:05.64
2,M,10000M Men,Rio,2016,B,Tamirat TOLA,ETH,27:06.26
3,M,10000M Men,Beijing,2008,G,Kenenisa BEKELE,ETH,27:01.17
4,M,10000M Men,Beijing,2008,S,Sileshi SIHINE,ETH,27:02.77


You can see that using pandas is concise and powerful. We can easily access columns, rows, and perform many transformations.

_End of notebook_