# Reading Data with Python

In [3]:
import pandas as pd

## Reading CSV and TXT

The sipliest way to read a file in Python is using the `with` and `open` method.

In [4]:
with open("../data/btc-market-price.csv", "r") as fp:
    for index, line in enumerate(fp.readlines()):
        if index <= 10:
            time, price = (line.split(","))
            print(f"{time}: ${price}")

2017-04-02 00:00:00: $1099.169125

2017-04-03 00:00:00: $1141.813

2017-04-04 00:00:00: $1141.6003625

2017-04-05 00:00:00: $1133.0793142857142

2017-04-06 00:00:00: $1196.3079375

2017-04-07 00:00:00: $1190.45425

2017-04-08 00:00:00: $1181.1498375

2017-04-09 00:00:00: $1208.8005

2017-04-10 00:00:00: $1207.744875

2017-04-11 00:00:00: $1226.6170375

2017-04-12 00:00:00: $1218.92205



But in our case in "exam_scores.csv", we don't have a comma, instead we have '>' as the separator.

In [5]:
# we can use csv module
import csv

In [6]:
with open("../data/exam_scores.csv", "r") as fp:
    reader = csv.reader(fp, delimiter=">")
    next(reader) # skip header
    for index, values in enumerate(reader):
        if not values: # skip blank lines
            continue
        fname, lname, age, math, french = values
        print(f"Name: {fname} {lname}, Age: {age} \n\t Math: {math}, French: {french} \n")
        

Name: Ray Morley, Age: 18 
	 Math: 68,000, French: 75,000 

Name: Melvin Scott, Age: 24 
	 Math: 77, French: 83 

Name: Amirah Haley, Age: 22 
	 Math: 92, French: 67 

Name: Gerard Mills, Age: 19 
	 Math: 78,000, French: 72 

Name: Amy Grimes, Age: 23 
	 Math: 91, French: 81 



## Reading Data withn Pandas

Except from`read_csv`, there are many other file reading functions in Pandas.
![pandas read function list](../img/pandas-read.jpg)

Another thing, you can add `?` to any function for faster access in documentation which is all about that function. For example, running `pd.read_json?` would return a quick documentation about `read_json`.

Pandas can also read files from URL links.

In [23]:
# make sure the link used is showing the raw file
csv_url = "https://raw.githubusercontent.com/krishnatray/RDP-Reading-Data-with-Python-and-Pandas/master/unit-1-reading-data-with-python-and-pandas/lesson-1-reading-csv-and-txt-files/files/btc-market-price.csv"

pd.read_csv(csv_url, delimiter=",", names=["Timestamp", "Price"]).head()

Unnamed: 0,Timestamp,Price
0,2/4/17 0:00,1099.169125
1,3/4/17 0:00,1141.813
2,4/4/17 0:00,?
3,5/4/17 0:00,1133.079314
4,6/4/17 0:00,-


### Handling missing Values

In [29]:
df = pd.read_csv(
    csv_url,
    sep=",", # same as delimiter
    header=None,
    names=["Timestamp", "Price"],
    na_values=["", "?", "-"] # replaces the following character with NaN
)
df.head()

Unnamed: 0,Timestamp,Price
0,2/4/17 0:00,1099.169125
1,3/4/17 0:00,1141.813
2,4/4/17 0:00,
3,5/4/17 0:00,1133.079314
4,6/4/17 0:00,


### Specifying column types

In [37]:
df = pd.read_csv(
    csv_url,
    sep=",", # same as delimiter
    header=None,
    na_values=["", "?", "-"], # replaces the following character with NaN
    names=["Timestamp", "Price"],
    dtype={"Price": "float"} # specifies data type for columns mentioned
)
df.head()

Unnamed: 0,Timestamp,Price
0,2/4/17 0:00,1099.169125
1,3/4/17 0:00,1141.813
2,4/4/17 0:00,
3,5/4/17 0:00,1133.079314
4,6/4/17 0:00,


In [35]:
df.dtypes

Timestamp     object
Price        float64
dtype: object

In [41]:
df["Timestamp"] = pd.to_datetime(df["Timestamp"])

df.dtypes

Timestamp    datetime64[ns]
Price               float64
dtype: object

### Using different Separator/Delimiter

In [43]:
pd.read_csv("../data/exam_scores.csv", delimiter=">")

Unnamed: 0,first_name,last_name,age,math_score,french_score
0,Ray,Morley,18,68000,75000
1,Melvin,Scott,24,77,83
2,Amirah,Haley,22,92,67
3,Gerard,Mills,19,78000,72
4,Amy,Grimes,23,91,81


In [44]:
# sep works the same as delimiter
pd.read_csv("../data/exam_scores.csv", sep=">")

Unnamed: 0,first_name,last_name,age,math_score,french_score
0,Ray,Morley,18,68000,75000
1,Melvin,Scott,24,77,83
2,Amirah,Haley,22,92,67
3,Gerard,Mills,19,78000,72
4,Amy,Grimes,23,91,81


You can use more helpful parameters like `skip_rows` and you can find more of these helpful methods in the documentation.

We can also save files with Pandas. Every `read_` function can be saved using `to_` like `read_csv` and `to_csv`.

In [47]:
df.to_csv("../data/hello.csv")

### Reading data from Database

The first you need is the right library for database engine you're using (Postgres, MySQL, etc), unlike in APIs where we use the same library all the time.