# Importing data from CSVs, Text, and Excel Files in to Pandas

Pandas has convenient methods for importing data from csv, text, and excel files.
<br>Data obtained from Kaggle for [heart failure prediction](https://www.kaggle.com/andrewmvd/heart-failure-clinical-data) will be used for demonstration of the **read_csv** method.

In [39]:
import pandas as pd

In [40]:
DfFromCSV = pd.read_csv('heart_failure_clinical_records_dataset.csv') 
DfFromCSV.head(3)

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1


This method will work as long as your file is in your working directory. 

The read_csv method can also be used to load in text files 

---
If the file is in a different location from your working directory, you can include an 'r' in front of the full file path for specif a raw string.  

In [38]:
DfFromCSV2 = pd.read_csv(r'C:\Users\TCang\exercises\Data Connections\heart_failure_clinical_records_dataset.csv') 
DfFromCSV2.head(3)

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1


---
The **read_csv** method is quite versatile in reading in different types of datasets. 
<br>For instance, when importing a tsv file (tab delimited), one only needs to change the 'sep' parameter to '\t'. 
<br>Another Kaggle dataset for [gapminder](https://www.kaggle.com/gbahdeyboh/gapminder) will be used for the following examples.

In [37]:
DfFromTSV = pd.read_csv('gapminder.tsv', sep='\t') 
DfFromTSV.head(3)

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap
0,Afghanistan,Asia,1952,28.801,8425333,779.445314
1,Afghanistan,Asia,1957,30.332,9240934,820.85303
2,Afghanistan,Asia,1962,31.997,10267083,853.10071


The method also allows for manipulation of the data being read in from the file. Specific columns from the file can be read into the dataset simply by specifying the desired index positions in the usecols parameter.  

In [41]:
DfFromTSV2 = pd.read_csv('gapminder.tsv', sep='\t', usecols = [0,1,4,5]) 
DfFromTSV2.head(3)

Unnamed: 0,country,continent,pop,gdpPercap
0,Afghanistan,Asia,8425333,779.445314
1,Afghanistan,Asia,9240934,820.85303
2,Afghanistan,Asia,10267083,853.10071


Further optional parameters for the **read_CSV** method can be found [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html "pd.read_csv")

---
Similarly, the method <b>read_excel</b> can be used for importing excel files

In [15]:
DfFromExcel = pd.read_excel('heart_failure_clinical_records_dataset.xlsx')

In [16]:
DfFromExcel.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


Further information about the **read_excel** method is available [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html "pd.read_excel").

Can also connect to a database by means of the PYODBC package

Create a db file
Connect to it. 

Connect to pdf and doc files