#### Misc python commands
os.getcwd(): retrieves current working directory in python shell
os.listdir(): outputs list of files in directory specified.

### Python intro to importing data (data camp)

`open(‘file_name,’mode’)`
	opens a file in the specified mode

`With open(‘file_name’) as name_of_file:`
	opens context manager which  allows you to execute commands on a file without needing to explicitly call it each time.

`File.readline()`
	Method to read a single line of a file then iterate to the next line.

Flat files are basically weakly formatted tables like csvs or tsvs.
	Can’t do any relational logic on them since they’re simply formatted to look like a table.

`Np.loadtxt(file, delimiter = ‘ ’, skiprows = #_of_rows, usecols= [m,n,e], dtype = datatype )`	
	Loads numeric data as a variable into numpy. 		Need to specify delimiter
	Skiprows skips n number of rows before reading
	usecols only use the columns according to the 
	list of column indices specified.
	Not really meant for mixed data.

`Np.genfromtxt(filename,delimiter =‘’,names = True/False dtype = datatype)`
	Can be used to import data into array of mixed types. Done by specifying dtype = None.
Names is the header column specification.
Retrieve rows with array[n] and columns with array[col_name]
Will only be one dimensional arrays, not a 2 dimensional table.

`Np.recfromcsv(filename)`
	Similar to genfromtxt, but automatically assumes dtype none and that the file is a css

`pandas_frame.values()`
	Converts pd_frame to a numpy array.

`Pd.read_csv(filename, sep = ‘’, comment = ‘’,na_values =)`
	comment ignores any data following the string 	specified
	na_values tells what additional data should be  
	considered as a null datatype.


#### Special file types
**Pickled Files**: Native Python file type that serializes a foreign file type.
Below is standard process for importing a pre-pickled file into python.
Note that 'rb' indicates read-only and that the file is in a binary format which all pickled files are.
```
import pickle
with open('pickled_file.pkl','rb') as file:
    data = pickle.load(file)
print(data)
```
**Excel Sheets**: Example of how to import excel files is below
```
import pandas as pd
file = 'urbanpop.xlsx'
data = pd.ExcelFile(file) # ExcelFile is excel file import method.
print(data.sheet_names)
```
Above imports excel file, loads it and prints sheet names.<br>
Utilize `data.parse('sheet_name')` to parse the data in particular excel sheet.<br>
You can also use `data.parse(n)` where n is the numeric index of the sheets in the excel file. <br>
Is ordered from 0 - (m-1).


#### Importing SAS/Stata Files w Pandas
- Most common SAS files are SAS7BDAT(CAT),which stand for data and catalog files respectively.<br><br>
Method to import SAS files below:
```
import pandas as pd
from sas7bdat import SAS7BDAT # Have to import a context manager to open sas file
with SAS7BDAT('file_name.sas7bdat') as file:
    df_sas = file.to_data_frame()
```
<br><br>
Method to import Stata files:
```
import pandas as pd
data = pd.read_stata('file_name.dta')
```

#### Importing HDF5 files
Method to import hdf5 files:
```
import h5py
filename = 'file_name.hdf5'
data = h5py.File(filename,'r')
print(type(data)) # will report unique hdf5 python class
```
<br><br>
HDF5 class in python can be queried like a dictionary. You can see what keys are in the file with the syntax `for key in data['sub-key'].keys(): print(key)`<br>
Sub-key may not be applicable depending on needs.

#### Importing MATLAB files
Can be imported with sciPy functions `scipy.io.loadmat()` and `scipy.io.savemat()`. The mat file contains the various objects saved in the matlab file.<br><br>
Method to import MATLAB files:
```
import scipy.io
filename = 'file_name.mat'
mat = scipy.io.loadmat(filenname)
```
<br>
MATLAB files that are loaded into python are stored as dicts. Key name corresponds to the object/variable name in MATLAB and values are what was actually stored. Key names that correspond to actual matlab variables will not be surrounded be like __name__.

#### Relational Databases in Python
How to create a database engine in Python(using SQLAlchemy):
```
from sqlalchemy import create_engine
engine = create_engine('db_type:db_name')
table_names = engine.table_names() # stores the names of the db tables in a variable as a list
```
<br><br>
How to query a relational database in python
```
from sqlalchemy import create_engine
import pandas as pd
engine = create_enginen('db_type:///db_name)
con = engine.connect() # need to create connection variable
query = con.execute("Query Logic") # connection.execute to run sql query
df = pd.DataFrame(rs.fetchall()) # retrieves result set from connection and stores to DF
df.columns = rs.keys() # applies column names from query to DF that you will manipulate
con.close() 
```
<br><br>
Using context manager to retrieve query results(removes need to open/close connection):
```
with engine.connect() as con:
    rs = con.execute("Query Logic")
    df = pd.DataFrame(rs.fetchmany(size=5)) # only fetches specific num of records
    df.columns = rs.keys() 
```
<br><br>
How to query db directly with Pandas:
```
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('db_type:///db_name)
df = pd.read_sql_query("Query Logic", engine)
``` 

#### Importing files from the web
Method to get file from web:
```
from urllib.request import urlretrieve 
url = 'url_name'
urlretrieve(url,'name_for_file')
```
**urllib** is library to open url files NOT scraping a page. In addition, `urlretrieve` saves the file to a local environment.
To retrieve a flat file from the web but load it directly into a DF you can do `pd.read_csv(url,sep = 'separator type')` OR `pd.read_csv(url,'separator type')`.<br>Lastly, this methodology extends to other **Pandas read_file** type functions, for example read_excel.<br><br>
Method to read in an xls file from the web:
```
import pandas as pd
url = 'url location'
xls = pd.read_excel(url,(sheet_names = [list, of, names] OR None))
```
None is passed when all excel sheets are desired.<br><br>
How to retrieve HTML 
```
from urllib.request import urlopen,Request
url = 'url_text'
request = Request(url)
response = urlopen(request)
html = response.read()
response.close()
```
HTML response is a unique class http.client.HTTPResponse
<br><br>
How to do the above but with Python requests package:
```
import requests
url = 'url_text'
r = requests.get(url)
text = r.text
```
Note that r.text is NOT a method but an attribute.

#### Scraping in Python
Method to scrape a web page:
```
from bs4 import BeautifulSoup
import requests
url = 'url_text'
r = requests.get(url)
html_doc = r.text
soup = BeautifulSoup(html_doc)
```
If necessary,BS allows you to reformat a retrieve htmldoc by using `soup.prettify()`<br>
In addition, to retrieve the title and text portions of an html doc you can call the attribute and method , `.title` and `get_text()`respectively, of a BS object.<br>
BS can also get all the urls from a page with the .find_all() method.<br><br>
Retrieving URLs from html_doc with BS:
```
import requests
from bs4 import BeautifulSoup
url = 'url_text'
r = requests.get(url)
html_doc = r.text
soup = BeautifulSoup(html_doc)
urls = soup.find_all(a) 
for link in urls:
    print(link.get('href'))
```
In the soup find_all() method you need to specify what html tag items you want. To get urls you need to specify an html tag of a. The for loop is how to retrieve each link individually and print it.

#### APIS and JSONS Intro
Method to extract info from json:
```
import json
with open('file.json','r') as json_file:
    json_data= json.load(json_file)
```
Above opens a json file locally and loads each key-value pair into a dict with `json.load`.
<br><br>
Method to connect to an API:
```
import requests
url = 'url_api_text'
r = requests.get(url)
json_data = r.json()
for key, value in json_data.items():
    print(key+ ':' ,value)
```
`Dict.items()` outputs a list of tuples with the key value pairs of the dict. The URL api text is going to depend on the api, so make sure to check documentation on the api of interest.

#### Twitter Streaming API and Guide for Data Camp 
Note that in order to actually use Twitter API you need to create an account to obtain authentication keys in order to actually use python with it. For this course, a mock-up was created to avoid this.<br><br>
Method to setup streaming api  and stream in python using tweepy library
```
import tweepy, json
access_token = 'token'
access_token_secret = 'secret'
consumer_key= 'key'
consumer_secret = 'con_secret' 

stream = tweepy.Stream(consumer_key,consumer_secret,access_token,access_token_secret)
stream.filter(track['apples','oranges'])
```
Note for actual code, probably best to seup a .gitignore file with the sensitive info and pull them in as variables to prevent exposing them.
