___

<a href='http://www.pieriandata.com'> <img src='./Pierian_Data_Logo.png' /></a>
___

# Data Input and Output

This notebook is the reference code for getting input and output, pandas can read a variety of file types using its pd.read_ methods. Let's take a look at the most common data types:

In [233]:
import numpy as np
import pandas as pd

In [234]:
import os
os.getcwd()

'c:\\PYTHON ELEMENTARY\\Python for DS and ML Bootcamp\\03-Python-for-Data-Analysis-Pandas'

In [235]:
# !pwd   #It stands for “print working directory.”
#It’s a command used in Unix/Linux/macOS terminals (and in tools like Jupyter notebooks using !pwd) to show you the current directory you’re in.
#This means you're currently working inside the '03-Python-for-Data-Analysis-Pandas' directory.

### **Our Jupyter notebook and all associated files (such as CSV or HTML) should be in the same directory.**

## CSV

### CSV Input

In [236]:
# read a CSV file
dff = pd.read_csv('example') #'example' is the name of our CSV file.
dff  #now we can see our CSV file as a DataFrame.
#type(df)       #pandas.core.frame.DataFrame

'''
-This reads data from a CSV file named example (likely example.csv) and loads it into a pandas DataFrame called df.
-It assumes the file is in the current working directory.
-If you omit the .csv extension (like in this example), pandas may still find the file if the extension is implied or already in the path.


'''

'\n-This reads data from a CSV file named example (likely example.csv) and loads it into a pandas DataFrame called df.\n-It assumes the file is in the current working directory.\n-If you omit the .csv extension (like in this example), pandas may still find the file if the extension is implied or already in the path.\n\n\n'

### CSV Output

In [237]:
# Write to a CSV file
# df.to_csv('example',index=False)
'''
This writes the DataFrame 'df' back to a CSV file named 'example' (again, likely example.csv).
The argument 'index=False' tells pandas not to write the row indices (which are usually 0, 1, 2, ...) into the CSV file as an extra column.
'''
dff.to_csv('example',index=False)

In [238]:
pd.read_csv('example')

Unnamed: 0,a,b,c,d
0,0,1,2,3
1,4,5,6,7
2,8,9,10,11
3,12,13,14,15


## Reading and writing from Excel files
Pandas can read and write excel files, keep in mind, this only imports data. Not formulas or images, having images or macros may cause this read_excel method to crash. 

### Excel Input

In [239]:
pd.read_excel('Excel_Sample.xlsx',sheet_name='Sheet1')

Unnamed: 0.1,Unnamed: 0,a,b,c,d
0,0,0,1,2,3
1,1,4,5,6,7
2,2,8,9,10,11
3,3,12,13,14,15


In [240]:
pd.read_excel('Excel_Sample.xlsx', sheet_name='Sheet1', index_col=0)
#index_col=0 in pd.read_excel() tells Pandas to use the first column (column at position 0) in the Excel sheet as the index of the resulting DataFrame, instead of treating it as regular data.

Unnamed: 0,a,b,c,d
0,0,1,2,3
1,4,5,6,7
2,8,9,10,11
3,12,13,14,15


### Excel Output

In [241]:
dff.to_excel('Excel_Sample2.xlsx',sheet_name='Sheet1')

In [242]:
pd.read_excel('Excel_Sample2.xlsx', index_col=0)

Unnamed: 0,a,b,c,d
0,0,1,2,3
1,4,5,6,7
2,8,9,10,11
3,12,13,14,15


## HTML

You may need to install htmllib5,lxml, and BeautifulSoup4. In your terminal/command prompt run:

    conda install lxml
    conda install html5lib
    conda install BeautifulSoup4

Then restart Jupyter Notebook.
(or use pip install if you aren't using the Anaconda Distribution)

Pandas can read table tabs off of html. For example:

### HTML Input

Pandas read_html function will read tables off of a webpage and return a list of DataFrame objects:

In [243]:
df = pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist')

In [244]:
df[0].head()

Unnamed: 0,Bank Name,City,State,Cert,Acquiring Institution,Closing Date,Fund Sort ascending
0,Pulaski Savings Bank,Chicago,Illinois,28611,Millennium Bank,"January 17, 2025",10548
1,The First National Bank of Lindsay,Lindsay,Oklahoma,4134,"First Bank & Trust Co., Duncan, OK","October 18, 2024",10547
2,Republic First Bank dba Republic Bank,Philadelphia,Pennsylvania,27332,"Fulton Bank, National Association","April 26, 2024",10546
3,Citizens Bank,Sac City,Iowa,8758,Iowa Trust & Savings Bank,"November 3, 2023",10545
4,Heartland Tri-State Bank,Elkhart,Kansas,25851,"Dream First Bank, N.A.","July 28, 2023",10544


In [245]:
df = pd.read_html('https://www.formula1.com/en/results/2025/races')
#You're using pandas.read_html() to read all the <table> elements from that web page. This function returns a list of DataFrames, because there might be more than one table on the page.

In [246]:
# df
# type(df)  #list
df[0]
# type(df[0]) #pandas.core.frame.DataFrame


Unnamed: 0,Grand Prix,Date,Winner,Car,Laps,Time
0,Australia,16 Mar 2025,Lando NorrisNOR,McLaren Mercedes,57,1:42:06.304
1,China,23 Mar 2025,Oscar PiastriPIA,McLaren Mercedes,56,1:30:55.026
2,Japan,06 Apr 2025,Max VerstappenVER,Red Bull Racing Honda RBPT,53,1:22:06.983
3,Bahrain,13 Apr 2025,Oscar PiastriPIA,McLaren Mercedes,57,1:35:39.435
4,Saudi Arabia,20 Apr 2025,Oscar PiastriPIA,McLaren Mercedes,50,1:21:06.758
5,Miami,04 May 2025,Oscar PiastriPIA,McLaren Mercedes,57,1:28:51.587


In [247]:
df[0]['Winner'] = df[0]['Winner'].str.slice(0, -3)
'''
The str.slice() method is part of pandas' string handling functions, which allow you to perform string manipulations on 
a pandas Series (like a column of a DataFrame). The str.slice() method specifically allows you to slice strings in the Series, similar to how you slice lists in Python.
Series.str.slice(start, stop, step)
Example: slice(1, 4) takes characters starting from index 1 to index 4 (exclusive), so for 'Alice', it takes 'lic'.

'''
df[0]

Unnamed: 0,Grand Prix,Date,Winner,Car,Laps,Time
0,Australia,16 Mar 2025,Lando Norris,McLaren Mercedes,57,1:42:06.304
1,China,23 Mar 2025,Oscar Piastri,McLaren Mercedes,56,1:30:55.026
2,Japan,06 Apr 2025,Max Verstappen,Red Bull Racing Honda RBPT,53,1:22:06.983
3,Bahrain,13 Apr 2025,Oscar Piastri,McLaren Mercedes,57,1:35:39.435
4,Saudi Arabia,20 Apr 2025,Oscar Piastri,McLaren Mercedes,50,1:21:06.758
5,Miami,04 May 2025,Oscar Piastri,McLaren Mercedes,57,1:28:51.587


In [248]:
df1 = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'David']})
df1['Name'].str.slice(1, -2)

0      li
1        
2    harl
3      av
Name: Name, dtype: object

____

# SQL (Optional)

* Note: If you are completely unfamiliar with SQL you can check out my other course: "Complete SQL Bootcamp" to learn SQL.

The pandas.io.sql module provides a collection of query wrappers to both facilitate data retrieval and to reduce dependency on DB-specific API. Database abstraction is provided by SQLAlchemy if installed. In addition you will need a driver library for your database. Examples of such drivers are psycopg2 for PostgreSQL or pymysql for MySQL. For SQLite this is included in Python’s standard library by default. You can find an overview of supported drivers for each SQL dialect in the SQLAlchemy docs.


If SQLAlchemy is not installed, a fallback is only provided for sqlite (and for mysql for backwards compatibility, but this is deprecated and will be removed in a future version). This mode requires a Python database adapter which respect the Python DB-API.

See also some cookbook examples for some advanced strategies.

The key functions are:

* read_sql_table(table_name, con[, schema, ...])	
    * Read SQL database table into a DataFrame.
* read_sql_query(sql, con[, index_col, ...])	
    * Read SQL query into a DataFrame.
* read_sql(sql, con[, index_col, ...])	
    * Read SQL query or database table into a DataFrame.
* DataFrame.to_sql(name, con[, flavor, ...])	
    * Write records stored in a DataFrame to a SQL database.

In [249]:
from sqlalchemy import create_engine    #It just allows us to create a very simple SQL engine in memory.

In [250]:
engine = create_engine('sqlite:///:memory:')    #This line of code has created a very temporary SQLite engine database running in memory.

In [251]:
dff.to_sql('my_table', engine)

4

In [253]:
sql_df = pd.read_sql('my_table',con=engine)

In [None]:
sql_df

Unnamed: 0,index,a,b,c,d
0,0,0,1,2,3
1,1,4,5,6,7
2,2,8,9,10,11
3,3,12,13,14,15


# Great Job!