# File Header

- In CSV data files, header contains information for each field. Same delimiter should be used for header file and data file and header file specifies how data fields should be interpreted.
- Case 1: When data file is having a file header:
    Automatically assign names to each column of data if data file is having a file header
- Case 2: When data file is not having a file header:
    We need to assign names to each column of data manually if data files is not having a file header.

# Delimiter

- In CSV data files, comma is the standard delimiter.
- Delimiter is used to separate values in the fields.
- Delimiter plays an important role while loading csv file into ML projects because we can use a different delimiter like tab or white space.

# Quotes

- In CSV files, double quotation mark is the default quote character.
- Quotes play an important role while uploading csv file into ML projects because we can use other quote character than double quotation mark.

# Load CSV with Python Std Library

- The most used approach to load csv file is the use of python standard library.
- It provides a variety of built-in modules namely csv module and the reader() function.
- Import the csv module provided by the python standard library: import csv
- Next, import the numpy module for converting loaded data into numpy array: import numpy as np
- Next, provide the full path of the file.
- Next, use the csv.reader() function to read data from CSV file

#### Example:

In [19]:
import csv
import numpy as np
path = r'..\..\..\income.csv'
with open(path,'r') as f:
    reader = csv.reader(f, delimiter=',')
    headers = next(reader)
    data = list(reader)
    data = np.array(data)
print(headers)
print(data.shape)
print(data[:3])

['Name', 'Age', 'Income($)']
(22, 3)
[['Rob' '27' '70000']
 ['Michael' '29' '90000']
 ['Mohan' '29' '61000']]


# Load CSV with loadtxt
#### Example: 

In [20]:
from numpy import loadtxt
path = r"..\..\..\income.csv"
datapath = open(path, 'r')
data = loadtxt(datapath, delimiter=",")
print(data.shape)
print(data[:3])

ValueError: could not convert string to float: 'Name'

# Load CSV with Pandas

- Another approach to load CSV data file is by Pandas and pandas.read_csv() function.
- It returns a pandas.dataFrame which can be used for plotting.

#### Example:

In [22]:
from pandas import read_csv
path = r"..\..\..\income.csv"
data = read_csv(path)
print(data.shape)
print(data[:3])

(22, 3)
      Name  Age  Income($)
0      Rob   27      70000
1  Michael   29      90000
2    Mohan   29      61000


In [23]:
# giving header names
from pandas import read_csv
path = r"..\..\..\income.csv"
headernames = ["First Name", "Age","Income"]
data = read_csv(path, names=headernames)
print(data.head())

  First Name  Age     Income
0       Name  Age  Income($)
1        Rob   27      70000
2    Michael   29      90000
3      Mohan   29      61000
4     Ismail   28      60000


# Correlation

- The relationship between 2 variables is called correlation.
- In statistics, the most common method for calculating correlation is Pearson's Correlation Coefficient.
- Coefficient value = 1 - It represents full positive correlation between variables.
- Coefficient value = -1 - It represents full negative correlation between variables.
- Coefficient value = 0 - It represents no correlation at all between variables.

In [24]:
from pandas import read_csv
from pandas import set_option
path = r"..\..\..\income.csv"
names = ['name','age','income']
data = read_csv(path, names=names)
set_option('display.width',100)
set_option('precision',2)
correlations = data.corr(method='pearson')
print(correlations)

Empty DataFrame
Columns: []
Index: []
