# Import Pandas
- Once Pandas is installed, import it in your applications by adding the import keyword:

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Read CSV Files

In [None]:
df = pd.read_csv('../data/raw/accident_region.csv')

In [None]:
print(df.to_string())

# Manually Drop Extra Columns

In [None]:
# df = df.iloc[:, :7]

In [None]:
# print(df.to_string())

## Removes columns that have all NaN values

In [None]:
df = df.dropna(axis=1, how="all")

In [None]:
print(df.to_string())

# Info About the Data

In [None]:
print(df.info())

# Data Cleaning
Data cleaning means fixing bad data in your data set.

Bad data could be:

- Empty cells
- Data in wrong format
- Wrong data
- Duplicates

## Remove Rows
One way to deal with empty cells is to remove rows that contain empty cells.

This is usually OK, since data sets can be very big, and removing a few rows will not have a big impact on the result

In pandas, `NA` (Not Available) values represent missing or null data in a dataset. 

> Note: By default, the dropna() method returns a new DataFrame, and will not change the original.

In [None]:
# df = df.dropna()

In [None]:
# print(df.to_string())

## Replace Empty Values
Another way of dealing with empty cells is to insert a new value instead.

This way you do not have to delete entire rows just because of some empty cells.

The `fillna()` method allows us to replace empty cells with a value:

In [None]:
df.fillna(0, inplace = True)

In [None]:
print(df.to_string())

## Removing Duplicates
To remove duplicates, use the `drop_duplicates()` method.

In [None]:
df.drop_duplicates(inplace = True)

## Wrong Data
"Wrong data" does not have to be "empty cells" or "wrong format", it can just be wrong

## Replacing Values
One way to fix wrong values is to replace them with something else.

In [None]:
df.columns.size

In [None]:
for col in range(3, df.columns.size):
    for row in df.index:
        # df.columns[col] is column name
        if df.loc[row, df.columns[col]] == '(Blank)':
            df.loc[row, df.columns[col]] = 0

In [None]:
print(df.to_string())

## Data Types and Conversion

In [None]:
for col in range(3, df.columns.size):
    # df.columns[col] is column name
    df[df.columns[col]] = df[df.columns[col]].replace({',': ''}, regex=True).astype(int)

## Save cleaned data

In [None]:
df.to_csv("../data/cleaned/accident_region_cleaned.csv", index=False)