# <center> Working with Tabular Data Files </center>

- [What is Tabular Data Format](#section_1)
- [Pandas read_csv() Function](#section_2)
- [Pandas read_excel() Function](#section_3)
- [Pandas read_sql() Function](#section_4)

<hr>

### What is Tabular Data Format <a class="anchor" id="section_1"></a>


Tabular data is usually structured into rows and columns and presented in various file formats including CSV, tab-delimited files, fixed-width formats, and spreadsheets. Tabular files can be accessed from the local computer or online.

### Pandas read_csv() Function <a class="anchor" id="section_2"></a>

In [1]:
# Import Pandas library
import pandas as pd

In [None]:
pd.set_option('display.float_format', lambda x: '%.0f' % x)

In [2]:
# Create a DataFrame using read_csv() function
alcohol_data = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv')
# Display the DataFrame
alcohol_data

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol
0,Afghanistan,0,0,0,0.0
1,Albania,89,132,54,4.9
2,Algeria,25,0,14,0.7
3,Andorra,245,138,312,12.4
4,Angola,217,57,45,5.9
...,...,...,...,...,...
188,Venezuela,333,100,3,7.7
189,Vietnam,111,2,1,2.0
190,Yemen,6,0,0,0.1
191,Zambia,32,19,4,2.5


In [16]:
# Create a DataFrame using read_csv() function. Filter rows and columns using function parameters
alcohol_data = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv',
                          nrows = 10,
                          usecols = [0,2,4])
# Display the DataFrame
alcohol_data

Unnamed: 0,country,spirit_servings,total_litres_of_pure_alcohol
0,Afghanistan,0,0.0
1,Albania,132,4.9
2,Algeria,0,0.7
3,Andorra,138,12.4
4,Angola,57,5.9
5,Antigua & Barbuda,128,4.9
6,Argentina,25,8.3
7,Armenia,179,3.8
8,Australia,72,10.4
9,Austria,75,9.7


In [18]:
# Create a DataFrame using read_csv() function. Assign one column as DataFrame index
alcohol_data = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv',
                          nrows = 10,
                          usecols = [0,2,4],
                          index_col = 'country')
# Display the DataFrame
alcohol_data

Unnamed: 0_level_0,spirit_servings,total_litres_of_pure_alcohol
country,Unnamed: 1_level_1,Unnamed: 2_level_1
Afghanistan,0,0.0
Albania,132,4.9
Algeria,0,0.7
Andorra,138,12.4
Angola,57,5.9
Antigua & Barbuda,128,4.9
Argentina,25,8.3
Armenia,179,3.8
Australia,72,10.4
Austria,75,9.7


In [37]:
# Create a DataFrame using read_csv() function. Assign one column as DataFrame index
alcohol_data = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv',
                          nrows = 10,
                          usecols = [0,2,4],
                          index_col = 0)
# Display the DataFrame
alcohol_data

Unnamed: 0_level_0,spirit_servings,total_litres_of_pure_alcohol
country,Unnamed: 1_level_1,Unnamed: 2_level_1
Afghanistan,0,0
Albania,132,5
Algeria,0,1
Andorra,138,12
Angola,57,6
Antigua & Barbuda,128,5
Argentina,25,8
Armenia,179,4
Australia,72,10
Austria,75,10


### Pandas read_excel() Function <a class="anchor" id="section_3"></a>

Another commonly used tabular data format is spreadsheets. Pandas library provides the [read_excel()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html) built-in function to access Microsoft Excel spreadsheet files.

In [38]:
# Create a DataFrame object using read_excel() function. 
# Use parameter to identify dataset
my_data = pd.read_excel('countries sample dataset.xlsx',
                       sheet_name = "short_list")
# Display the DataFrame
my_data

Unnamed: 0,country_code,country_name,capital_city,population_size,area_km2
0,CN,China,Beijing,1440297825,9596961
1,AR,Argentina,Buenos Aires,45267449,2780400
2,IN,India,New Delhi,1382345085,3287263
3,NG,Nigeria,Abuja,206984347,923768
4,US,United States,Washington,331341050,9525067


In [39]:
# Create a DataFrame object using read_excel() function. 
# Use parameter to identify dataset
my_data = pd.read_excel('countries sample dataset.xlsx',
                       sheet_name = "long_list")
# Display the DataFrame
my_data

Unnamed: 0,country_code,country_name,capital_city,population_size,area_km2
0,AT,Austria,Vienna,9015361,83871
1,BD,Bangladesh,Dhaka,164972348,148460
2,BR,Brazil,Brasília,212821986,8515767
3,EG,Egypt,Cairo,102659126,1002450
4,JP,Japan,Tokyo,126407422,377976
5,MA,Morocco,Rabat,36985624,446550
6,CA,Canada,Ottawa,37799407,9984670
7,NZ,New Zealand,Wellington,4829021,270467
8,MY,Malaysia,Kuala Lumpur,32436963,330803
9,PK,Pakistan,Islamabad,221612785,881913


### Pandas read_sql() Function <a class="anchor" id="section_4"></a>

Another common scenario is to query relational database tables using SQL language. Obviously, you would need to provide the necessary credentials and metadata to establish a connection with the database server. You can then apply [read_sql()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html) function to pass the SQL query and load the result into a Pandas DataFrame object. 

To simulate this scenario, the following code will create a local database using the Python SQLite engine. We will then use Pandas to access the data using SQL queries. 

In [None]:
# Import SQLite library
import sqlite3

# Assign the database name
db_path = 'Datasets/local_db_example.db'

# Create the database file
conn = sqlite3.connect(db_path)

# Establish a connection with the database file
c = conn.cursor()

# Create a database table
c.execute("""CREATE TABLE mytable
         (id, name, position)""")

# Add some data
c.execute("""INSERT INTO mytable (id, name, position)
          values(1, 'James', 'Data Scientist')""")

c.execute("""INSERT INTO mytable (id, name, position)
          values(2, 'Mary', 'Software Developer')""")

c.execute("""INSERT INTO mytable (id, name, position)
          values(3, 'Max', 'Data Engineer')""")

# Commit changes and close the connection
conn.commit()

# Close the connection
c.close()

The relational database name `local_db_example.db` should appear as an external file in the same location with your notebook. The database file already includes dummy data describing employee details. The following code queries the data into a Pandas DataFrame object. 

In [1]:
# Identify the database name
database = 'Datasets/local_db_example.db'

# Establish a connection with the database file
conn = sqlite3.connect(database)

# Use Pandas function to pass SQL query and create a DataFrame object
df_people = pd.read_sql("select * from mytable", con = conn)

# Print the generated DataFrame
print(df_people)

# Close the connection
conn.close()

NameError: name 'sqlite3' is not defined

In the above example, we created a local database file and used the Pandas library to query the data using SQL, and passed the results into a Pandas DataFrame object. In more practical examples, you may need to query data from relational databases that are stored on remote servers or in the cloud.

**[Back to Top](#title)**

In this section, we will learn about how to access data from CSV files, Excel Sheet files, and SQL tables. First, we access a [CSV](https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv) file. To do that, we use the [read_csv()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) file for alcohol consumption by country accessed from the [fivethirtyeight GitHub Repository](https://github.com/fivethirtyeight/data). To do that, we use the [read_csv()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) function and pass the file online location on GitHub. If the CSV file is stored on the local machine, we need to pass the file path. 


Notice how the [read_excel()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html) example makes use of the sheet_name parameter to tell the system which sheet name contains the needed dataset. For a complete list of all parameters for each built-in function, check the Pandas official documentation by clicking the function name. 