# Reading in our data: Absolute and relative file paths

We use the `pandas` library to read in data, like a .csv file, to Python. In this example, we will use the `pd.read_csv()` method. 

You can find more at this [Datacamp tutorial](https://www.datacamp.com/community/tutorials/pandas-read-csv). 

Note I am mixing Markdown and code comments in this notebook. Just trying to get you use to both. Code comments are for you and others who will be looking at the code. Markdown lets us write-up more formal reports.

In [1]:
# Bring in the packages that we need. 
import numpy as np  
import pandas as pd  
from pylab import plt, mpl 

# Set up output
plt.style.use('seaborn')  
mpl.rcParams['font.family'] = 'serif'  
%config InlineBackend.figure_format = 'svg'

In [2]:
# A relative file path is relative to your main folder. In this case, I am in fin-data-analysis-python, a folder that lives on my computer (and on Github, where you can access it).
# You can right-click on a file in the explorer and copy either the full path or the relative path.
# You can even read the URL for where the data lives on Github!

# Full file path: /Users/adamaiken/fin-data-analysis-python/data/tr_eikon_eod_data.csv
# Relative file path: data/tr_eikon_eod_data.csv

# You should use relative paths for your local data. Why? That way, you can always have the same set-up. A main folder with code, data, output, etc. subfolders. 
# Then, you just always refer to that common set-up, without worrying exactly where the files live. 
# It is the relationship between your files that matters when using relative file paths.

# You can also just use the data that I post to Github without every copying it locally! You have to click around a bit to get the right "raw" URL though.

# We'll use read_csv from the Pandas package to read in the CSV file. CSV files are common ways to store data.

# The code below should create three identical dataframes, with all of the tickers, using three different ways to get the same data. I then subset and create three identical dataframes with just the SPX.

data_r = pd.read_csv('../data/tr_eikon_eod_data.csv', index_col=0, parse_dates=True)  
data_r_spx = pd.DataFrame(data_r['.SPX'])
data_r_spx.dropna(inplace=True) 

 

In [3]:
data_f = pd.read_csv('/Users/adamaiken/fin-data-analysis-python/data/tr_eikon_eod_data.csv',
                  index_col=0, parse_dates=True)  
data_f_spx = pd.DataFrame(data_f['.SPX'])
data_f_spx.dropna(inplace=True) 
  

In [4]:
data_u = pd.read_csv('https://raw.githubusercontent.com/aaiken1/fin-data-analysis-python/main/data/tr_eikon_eod_data.csv',
                  index_col=0, parse_dates=True)  
data_u_spx = pd.DataFrame(data_u['.SPX'])
data_u_spx.dropna(inplace=True) 


In [5]:
# We can even test if two data frames are the same! This is like automatically seeing if two Excel worksheets are identical. Very cool!

# The pd.DataFrame.equals() method does this for us. 

# Are they the same? You can change the file names to test the others!
print(pd.DataFrame.equals(data_r, data_f))

True


In [6]:
print(pd.DataFrame.equals(data_r_spx, data_f_spx))

True
