# Writing and reading to different formats
We'll just cover the different formats that you can get data in, and how you can convert from and to. Like you could get data from an excel file, csv file, json, and of course SQL databases.

## Working with excel files
- `xlwt` and `xlrd` are old libraries that are needed when you want to read and write with `.xls` files.
- `openpyxl`: A popular library that's commonly used with pandas when interacting with excel files. In most modern cases you'll only need this library
```
pip install xlwt xlrd openpyxl 
```
I mean we don't really make use of these packages in teh examples, but it's just good to know they exist and whatnot. 
## Working with PostgreSQL
- `SQLAlchemy`: Popular SQL orm for python
- `psycopg2-binary`: Extra library for working with a postgres database
```
pip install SQLAlchemy psycopg2-binary
```

## Reading Urls
You can read in urls as data frames, however the urls need to pass back data. One form of data that's accepted is JSON. So yeah this just indicates that you can use pandas and hit APIs with it as well!

However it isn't limited to this. If the url that you pass in was linked to a csv file, then you'd do `.read_csv(someUrl)`. If it was an excel file then `.read_excel(someUrl).`

In [1]:
'''
- Ex.1 Reading and writing to CSV
'''
import pandas as pd
csvPath = "../data/survey_results_public.csv"

# 1. Reading a csv
df = pd.read_csv(csvPath, index_col="Respondent")

# 2. Create a filter that will only return the rows with 'Country' of india.
countryFilter = (df["Country"] == "India")

# 3. Create that new data frame for only india related data. Then convert that data frame as a csv
india_df = df.loc[countryFilter]
writePath = "../data/modified_survey_results.csv"
india_df.to_csv(writePath)


In [None]:
'''
- Ex.2: A tab separated file is basically a csv, but the columns are separated by
\t characters instead of commas. To read it in just pass sep="\t" and the path to your .tsv file.
For writing use sep as well. 
'''
df = pd.read_csv(csvPath, index_col="Respondent")

# 2. Create a filter that will only return the rows with 'Country' of india.
countryFilter = (df["Country"] == "India")

india_df = df.loc[countryFilter]
writePath = "../data/modified_survey_results.tsv"
india_df.to_csv(writePath, sep="\t")

In [9]:
import pandas as pd
csvPath = "../data/survey_results_public.csv"
excelPath = "../data/survey_results_public.xlsx"

# 1. Reading a csv
df = pd.read_csv(csvPath, index_col="Respondent")
df = df.head(5)

# 2. Create an excel file at the specified path
df.to_excel(excelPath)

# 3. Read as an excel file
excelDf = pd.read_excel(excelPath, index_col="Respondent")
df

Unnamed: 0_level_0,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,EduOther,...,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,I am a student who is learning to code,Yes,Never,The quality of OSS and closed source software ...,"Not employed, and not looking for work",United Kingdom,No,Primary/elementary school,,"Taught yourself a new language, framework, or ...",...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,14.0,Man,No,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult
2,I am a student who is learning to code,No,Less than once per year,The quality of OSS and closed source software ...,"Not employed, but looking for work",Bosnia and Herzegovina,"Yes, full-time","Secondary school (e.g. American high school, G...",,Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,19.0,Man,No,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult
3,"I am not primarily a developer, but I write co...",Yes,Never,The quality of OSS and closed source software ...,Employed full-time,Thailand,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)",Web development or web design,"Taught yourself a new language, framework, or ...",...,Just as welcome now as I felt last year,Tech meetups or events in your area;Courses on...,28.0,Man,No,Straight / Heterosexual,,Yes,Appropriate in length,Neither easy nor difficult
4,I am a developer by profession,No,Never,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,22.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
5,I am a developer by profession,Yes,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Ukraine,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech meetups or events in your area;Courses on...,30.0,Man,No,Straight / Heterosexual,White or of European descent;Multiracial,No,Appropriate in length,Easy


In [10]:
'''
- Ex.3: Now the json generated will look very dictionary like. So for a given 
column 'username', it'll be a key, and its value will be another dictionary with keys being
indexes and values being the actual series value.

This is how you'd get the username value for index '1'.
username['1']

- Changing JSON to be 'listlike' instead of dictionary like:
Pass in orient="records". Now each row will be an object/dictionary with those column values as keys.
I like this version a little better for some reason.
''' 

import pandas as pd
csvPath = "../data/survey_results_public.csv"
jsonPath = "../data/survey_results_public.json"

# 1. Reading a csv
df = pd.read_csv(csvPath, index_col="Respondent")
df = df.head(5)

# 2. Create a json file at the specified path
df.to_json(jsonPath, orient="records", lines=True)

In [None]:
from sqlalchemy import create_engine
import pandas as pd
'''
- Ex. 4: So we're going to get data from a csv. Then we're going to insert that data 
into a postgres database. This is actually pretty smart as it's simply ridiculous to do 
this manually.
'''

df = pd.read_csv(csvPath, index_col="Respondent")
countryFilter = (df["Country"] == "India")
india_df = df.loc[countryFilter]
india_df = india_df.head(10) # Just planning to only save 10 into the database

# Via SQLAlchemy, connect to our postgres database with our credentials 
dbUsername = "myUser"
dbPassword = "myPassword"
dbName = "myDatabase"
dbTableName = "my_first_table"
dbPort = 5000

postgresEngine = create_engine(f"postgresql://{dbUsername}:{dbPassword}@localhost:{dbPort}/{dbName}")

# Create SQL table 'my_first_table' and inject the rows from the india data frame; we'll do replace so that everytime we run the 
# script to create a table, if the table already exists then we'll replace the existing table with our new one.
india_df.to_sql(dbTableName, postgresEngine, if_exists="replace")


# Read the entire table from postgres, which should only be 10 rows
sql_df = pd.read_sql(dbTableName, postgresEngine, index_col="Respondent")

# However, the big thing is that we're also able to run SQL queries via pandas, and store the results as data frames.
sql_query_df = pd.read_sql_query(f"SELECT * FROM {dbTableName}", postgresEngine, index_col="Respondent")
sql_query_df

In [23]:
'''
- Ex. 5: Using urls 

'''
import pandas as pd

url = "https://api.coindesk.com/v1/bpi/currentprice.json"
url_df = pd.read_json(url)

url_df.head()

Unnamed: 0,time,disclaimer,chartName,bpi
updated,"Aug 21, 2024 04:33:38 UTC",This data was produced from the CoinDesk Bitco...,Bitcoin,
updatedISO,2024-08-21T04:33:38+00:00,This data was produced from the CoinDesk Bitco...,Bitcoin,
updateduk,"Aug 21, 2024 at 05:33 BST",This data was produced from the CoinDesk Bitco...,Bitcoin,
USD,,This data was produced from the CoinDesk Bitco...,Bitcoin,"{'code': 'USD', 'symbol': '&#36;', 'rate': '59..."
GBP,,This data was produced from the CoinDesk Bitco...,Bitcoin,"{'code': 'GBP', 'symbol': '&pound;', 'rate': '..."
