# Data Science Cheat Sheet and Examples

## By Weng Fei Fung

Here I explore some basic loading into Pandas dataframes for analysis and table printing. I also explore some basic plotting with matplotlib. This will become a basic cheat sheet.

In [107]:
import json
import requests
import pandas as pd

# Python multiline strings are delimited with three double quotes """..."""
print("""
-------------------------------------------------------------------------
Request API, Perform Dictionary Manipulation, Then Load into Dataframe
-------------------------------------------------------------------------""");

data = requests.get("https://pokeapi.co/api/v2/pokemon/pikachu");

# Convert API Response to Dictionary
dict = data.json();

# Dump Dictionary into a JSON String
json_str = json.dumps(dict);

# Substring
print("""Pikachu's abilities from pokedex API
"""
+
json_str[0:33] + "...");

print("\n")

# Display Dataframe
# Use list comprehension to create a list from a for-loop
data = [json.dumps(row["ability"]) for row in dict["abilities"]];
df0 = pd.DataFrame(data, columns=["ability_json"])
print(df0);

# ================================================================================================ #



-------------------------------------------------------------------------
Request API, Perform Dictionary Manipulation, Then Load into Dataframe
-------------------------------------------------------------------------
Pikachu's abilities from pokedex API
{"abilities": [{"ability": {"name...


                                        ability_json
0  {"name": "static", "url": "https://pokeapi.co/...
1  {"name": "lightning-rod", "url": "https://poke...


In [100]:
import json
import requests
import pandas as pd

print("""
----------------------------------
Load JSON String into Dataframes
And Renaming Row Indexes
----------------------------------
""");

# Notice each dictionary in the list is a row and the JSON keys become the column indexes
df1 = pd.read_json('[{"Sleep Onset":"2300","Woke":"0800"},{"Sleep Onset":"2300","Woke":"0800"}]');

# Rename the row indexes from 0,1... to dates
df1 = df1.rename(index={0: '2023-01-01', 1: '2023-01-02'});
print(df1);




----------------------------------
Load JSON String into Dataframes
And Renaming Row Indexes
----------------------------------

            Sleep Onset  Woke
2023-01-01         2300   800
2023-01-02         2300   800


In [106]:
import json
import pandas as pd

print("""
-----------------------------------------
Load CSV file into Dataframe
From Converting Excel tab into CSV File
-----------------------------------------
""");

# Read Excel spreadsheet
df2 = pd.read_excel("data/Runs.xlsx", sheet_name="Protocol")

# Write to CSV file
df2.to_csv("reports/Runs.csv", index=False);

# Read CSV file into Dataframe
df22 = pd.read_csv('reports/Runs.csv')
df22 = df22.rename(index={0:1,1:2,2:3,3:4})
print(df22);


-----------------------------------------
Load CSV file into Dataframe
From Converting Excel file into CSV File
-----------------------------------------

   Stage           Duration  \
1      1   20-30 mins total   
2      2   45-60 mins total   
3      3   60-90 mins total   
4      4  90-120 mins total   

                                Running instructions  \
1  You will be slower than the other runners at f...   
2  Try minimizing walking breaks and focus on mai...   
3  Try minimizing walking breaks and focus on mai...   
4  Try minimizing walking breaks and focus on mai...   

                   Frequency days/week  \
1   3+ times / week to build endurance   
2  4-5 times / week to build endurance   
3  5-6 times / week to build endurance   
4  6-7 times / week to build endurance   

                                            Distance  \
1                                        Don't worry   
2  Add extra km or mile to one of your runs each ...   
3  Aim to increase your week

In [109]:
import json
import pandas as pd

print("""
-----------------------------------------
Load TSV formatted file into Dataframe
And show first 10 lines using head
-----------------------------------------
""");

# Read .txt or .tsv file into Dataframe
# Note you read it similar to a csv file, but you set the delimiter character to \t
df3 = pd.read_csv('data/Popular Videogames.txt', delimiter='\t')
print(df3.head(10));


-----------------------------------------
Load TSV formatted file into Dataframe
And show first 10 lines using head
-----------------------------------------

   Unnamed: 0                                    Title  Release Date  \
0           0                               Elden Ring  Feb 25, 2022   
1           1                                    Hades  Dec 10, 2019   
2           2  The Legend of Zelda: Breath of the Wild  Mar 03, 2017   
3           3                                Undertale  Sep 15, 2015   
4           4                            Hollow Knight  Feb 24, 2017   
5           5                                Minecraft  Nov 18, 2011   
6           6                                    Omori  Dec 25, 2020   
7           7                            Metroid Dread  Oct 07, 2021   
8           8                                 Among Us  Jun 15, 2018   
9           9                           NieR: Automata  Feb 23, 2017   

                                               