## Trial code

In [2]:
!find /c /v "" data\earthquakes.csv


---------- DATA\EARTHQUAKES.CSV: 9333


In [1]:
!dir data | findstr "earthquakes.csv"

06/20/2022  04:17 PM         3,524,989 earthquakes.csv


In [9]:
%%bash
head -n 2 data/earthquakes.csv

alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,mmi,net,nst,place,rms,sig,sources,status,time,title,tsunami,type,types,tz,updated,url
,,37389218,https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=ci37389218&format=geojson,0.008693,,85.0,",ci37389218,",1.35,ml,,ci,26.0,"9km NE of Aguanga, CA",0.19,28,",ci,",automatic,1539475168010,"M 1.4 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475395144,https://earthquake.usgs.gov/earthquakes/eventpage/ci37389218


In [18]:

files = !dir data
[file for file in files if 'earthquake' in file]

['06/20/2022  04:17 PM         3,524,989 earthquakes.csv']

In [1]:
import sqlite3

In [4]:
con = sqlite3.connect('data/quakes.db')
cursor = con.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
print(cursor.fetchall())
cursor.close()
con.close()

[('tsunamis',)]


## Notes

- DataFrames consists of numpy Series objects with an Index
- Pandas DataFrames can be created in a variety of ways
  - Python objects: dictionaries, list of dictionaries, tuples
  - A file, e.g. A .csv files
  - Database using sqlite3 
  - API
- General guidelines when creating a DataFrame:
  - Inspect the DataFrame to see if data was actually loaded with `.empty` attribute
  - Check the shape with the `.shape` attribute
  - Inspect the contents with the `.head()` or `.tail()` method
  - Check data types with the `.dtype` attribute
  - Describe and summarize the data with the `.describe()` method for both the numeric and object columns
    - object data types can be summarized with the `include='all'` or `include=np.object` options in the `.describe()` method
    - Pg. 79 contains a list of other useful calculation methods for Series and Dataframes
    - Pg. 81 contains a list of useful methods for the Index
  - Other useful methods to help summarize the data include the following:
- Subsetting the data is useful to break out data of interest for analysis
  - Sometimes you don't need all of the data for a particular study
  - Subsetting can be performed with both Indexing by column name or by attribute name
    - Lists can be passed via Indexing to select multiple columns
  - String methods are useful when working with objects
    - `.startswith()`, `.endswith()`, `.contains()`, and `.isin()` are examples of useful string methods
    - `.isin()` is useful when searching with a list of strings
    - `.contains` is useful when searching for partial matches, lists can be used using the `.join()` method, e.g. `'|'.join(List)`
- Slicing can be used to select specific rows and can be chained with subsetting
- Indexing operations can combine subsetting and slicing
  - `.loc[]` can be used to perform value-based lookups
  - `.iloc[]` can be used to perform index-based lookups
  - This can also be chained
- Filter can be performed within indexing operations aand subsetting
  - Combining masks require bitwise operators, e.g. &, |
  - `.idxmin()` and `.idxmin()` methods are useful for finding the index of the maximum and miminum values
- Sometimes data needs to be added or removed from a Dataframe, especially during the exploratory data analysis loop
  - New columns can be created by Indexing with a new column name and assigning the data
  - The `.assign()` method can be used to add data, but it follows different syntax, see examples
    - Lambda functions are useful in the `.assign()` method
  - The `.concat()` method can be used to combine Dataframes and add data
    - An example of this workflow could be to import new data with the `.read_csv()` command and the `usecol` optional input to bring in new columns. Then the new Dataframe is concatenated with the old to add the new data. If the index is the same between the two, the columns of the new data can be appended to the old.
    - Joins can be used to help manage the concatenation
  - The `del` function can be used to delete columns from a Dataframe, e.g. `del df['column']`
  - The `.pop()` method ca be used to remove the column from a Dataframe but allows a user to assign the column to a different variable, like removing a mask column created earlier to be used at a later time 