<a href="https://colab.research.google.com/github/MonkeyWrenchGang/MGTPython/blob/main/module_2/2_pandas_follow_along_Import_Export.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pandas Importing and Exporting Data


---

Pandas is a powerful library for data manipulation and analysis in Python, and it provides many convenient functions for importing and exporting data. In this notebook we are going to dive into importing and exporting data with pandas. 



# Import Libraries

Before we dive into the exciting world of Pandas, let's set the stage by importing the necessary libraries and configuring our environment. With all the tools at our disposal and the optimal settings in place, we'll be ready to tackle any challenge that comes our way.

In [23]:
# -- notebook options -- 
from IPython.core.display import display, HTML
from IPython.display import clear_output
display(HTML("<style>.container { width:90% }</style>"))
import warnings
warnings.filterwarnings('ignore')
# ------------------------------------------------------------------

# -- key libraries --
import pandas as pd
import sqlite3


# -- need this to render charts in notebook -- 
%matplotlib inline

# About Pandas! 
Pandas is a powerful library for data manipulation and analysis in Python, and it provides many convenient functions for importing and exporting data. Here are a few examples of some common data formats:

CSV:

```python
import pandas as pd

# Importing a CSV file
df = pd.read_csv('data.csv')
print(df)

# Exporting a CSV file
df.to_csv('data.csv', index=False)

```
Excel:
```python
# Importing an Excel file
df = pd.read_excel('data.xlsx')
print(df)

# Exporting an Excel file
df.to_excel('data.xlsx', index=False)

```
JSON:
```python
# Importing a JSON file
df = pd.read_json('data.json')
print(df)

# Exporting a JSON file
df.to_json('data.json')

```

SQL:

```python
import sqlite3

# Importing data from a SQL database
con = sqlite3.connect("data.db")
df = pd.read_sql("SELECT * FROM data", con)
print(df)

# Exporting data to a SQL database
con = sqlite3.connect("data.db")
df.to_sql("data", con, index=False)
```

*NOTE: Pandas provides functionality for connecting to and interacting with a variety of databases, including Oracle, DB2, Redshift, MySQL, Postgres and many others. The  sqlalchemy library is often used to  connect to remote databases via pandas.*


In addition to these file formats, pandas also provides a convenient way to import and export data from other data sources such as databases, APIs, and others. You can use the pd.read_* and df.to_* functions for importing and exporting the data respectively.

## Read and Write a CSV File 

A CSV (Comma Separated Values) file is a plain text file format that stores tabular data in which each line of the file represents a row of the table and each field (column) within that row is separated by a comma. CSV files are simple to create and can be easily imported and exported by most spreadsheets and databases.

for example:

```
Name,Age,Gender
Alice,25,Female
Bob,32,Male
Charlie,28,Male
```
As you can see, in the CSV file the first line is usually the header containing the names of each column, and the rest of the lines contain the data, each value separated by a comma.

It is also worth noting that sometimes CSV files use different delimiter characters such as semicolon (;) or tab (\t) instead of comma. 



---
# Lets import CSV File! 

- broward_listings.csv 
- AMZN.csv

Download the CSV file from Canvas to a location on your computer or on google drive. As a helper i've also included link to github hosted files. 

Github:
- https://raw.githubusercontent.com/MonkeyWrenchGang/MGTPython/main/module_2/data/broward_listings.csv
- https://raw.githubusercontent.com/MonkeyWrenchGang/MGTPython/main/module_2/data/AMZN.csv
- https://raw.githubusercontent.com/MonkeyWrenchGang/MGTPython/main/module_2/data/amazon.json

In [None]:
listings = pd.read_csv("https://raw.githubusercontent.com/MonkeyWrenchGang/MGTPython/main/module_2/data/broward_listings.csv")
listings.head()

In [None]:
amzn = pd.read_csv("https://raw.githubusercontent.com/MonkeyWrenchGang/MGTPython/main/module_2/data/AMZN.csv")
amzn.head()

# Export Dataframe to CSV and Excel

You can use the pandas.DataFrame.to_csv() method to export a DataFrame to a CSV file. This method takes several arguments, including the file path, the separator, and the options for handling missing data.

The pandas.DataFrame.to_excel() method to export a DataFrame to an Excel file. The method writes the data to an Excel sheet within a new or existing Excel file.



```python

# Write the DataFrame to a CSV file
df.to_csv('data.csv', index=False)
df.to_csv('data.csv', sep='\t', header=False, na_rep='NaN')


# Write the DataFrame to an Excel file
df.to_excel('data.xlsx', index=False)
df.to_excel('data.xlsx', engine='openpyxl', sheet_name='Sheet1', startrow=1, startcol=1)

```

---



Let's export our datasets! 

1. write AMZN to a CSV file 
2. write Listings to a Excel file 




In [28]:
amzn.to_csv("amzn.csv")

In [29]:
listings.to_excel("listings.xlsx")

# Import & Export JSON

What is JSON? JSON (JavaScript Object Notation) is a lightweight data-interchange format that is easy for humans to read and write and easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language. 

JSON is a collection of key-value pairs, where each key must be a string, and the value can be a string, number, boolean, null, array, or another JSON object. JSON objects are delimited with curly braces {} and are separated by commas, while keys and values are separated by colons. JSON arrays are delimited with square brackets [] and are separated by commas.

Here's an example of a JSON object:
```json
{
  "name": "John Smith",
  "age": 35,
  "address": {
    "street": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "zip": "10021"
  },
  "phoneNumbers": [
    { "type": "home", "number": "212 555-1234" },
    { "type": "fax", "number": "646 555-4567" }
  ]
}

```
Looks a lot like a python dictonary doesn't it? 

Anyway, lets export AMZN to JSON. 

```python
# Read a JSON file
df = pd.read_json("data.json")

# Exporting a JSON file
df.to_json('data.json')

```

In [None]:
# Importing JSON
amzn_2 = pd.read_json("https://raw.githubusercontent.com/MonkeyWrenchGang/MGTPython/main/module_2/data/amazon.json")
amzn_2.head()

In [31]:
# Exporting a JSON file
amzn.to_json('amazon.json')

# SQLite?

SQLite is a relational database management system (RDBMS) contained in a C library. It is one of the most widely-deployed SQL database engines in the world, and it is used in a wide variety of applications, including web browsers, mobile phones, operating systems, and embedded systems.

SQLite is often the technology of choice for small applications, particularly those of embedded systems and devices like phones and tablets, smart appliances, or IoT gadgets as well as small and medium-sized websites, due to its small size, low-overhead, and good performance. SQLite does not require a separate server process or system to operate, and it can read and write directly to ordinary disk files.

let's do some SQLite stuff! 



In [32]:
import sqlite3


# Export Listings to SQLite

You can use the pandas.DataFrame.to_sql() method to export a DataFrame to a SQLite3 database. This method takes several arguments, including the name of the table, the connection object, and the table creation mode.

Here's an example of how you might export a DataFrame to an SQLite3 database:

```python

import pandas as pd
import sqlite3

# Create a DataFrame
df = pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})

# Connect to a database (or create one if it doesn't exist)
con = sqlite3.connect('data.db')

# Write the DataFrame to the table 'data' in the database
df.to_sql('data', con, if_exists='replace')
```
Here the if_exists argument is used to specify the action to take when the table already exists in the database. The options are "fail", "replace", and "append".





In [33]:
import sqlite3
# Connect to a database (or create one if it doesn't exist)
con = sqlite3.connect('my_listings.db')

# Write the DataFrame to the table 'data' in the database
listings.to_sql('listings', con, if_exists='replace')
amzn.to_sql('amzn', con, if_exists='replace')

## A how about some SQL? 

You can use the pandas library to query a SQLite table by first connecting to the database using the sqlite3 library and then passing the query to the pandas.read_sql_query() function. 

Here is an example:
```python
import sqlite3
# Connect to the database
conn = sqlite3.connect("mydatabase.db")

# Construct the query
query = "SELECT * FROM mytable"

# Execute the query and store the results in a DataFrame
df = pd.read_sql_query(query, conn)

# Close the connection
conn.close()
```


---

We created a databasae called listings.db that contains a single table called listings, let's run a couple queries and see what happens. 

1. "SELECT * FROM listings WHERE neighbourhood = 'West Park'" 
2. "SELECT neighbourhood, count(*) as count FROM listings GROUP BY neighbourhood ORDER BY count DESC LIMIT 10"

3. "SELECT room_type, AVG(price) as mean_price FROM listings WHERE room_type = 'Private room' GROUP BY  room_type"





In [36]:
import sqlite3
# Connect to the database
conn = sqlite3.connect("my_listings.db")

# Construct the query
query1 = "SELECT * FROM listings WHERE neighbourhood = 'West Park'"

# Execute the query and store the results in a DataFrame
result1 = pd.read_sql_query(query1, conn)
result1.head()

In [None]:
# Construct the query
query2 = "SELECT neighbourhood, count(*) as count FROM listings GROUP BY neighbourhood ORDER BY count DESC LIMIT 10"

# Execute the query and store the results in a DataFrame
result2 = pd.read_sql_query(query2, conn)
result2

In [None]:
# Construct the query
query3 = "SELECT room_type, AVG(price) as mean_price FROM listings GROUP BY  room_type"

# Execute the query and store the results in a DataFrame
result3 = pd.read_sql_query(query3, conn)
result3