# Going Green Repo

All the data can be found [here](https://opendata.mass-cannabis-control.com/) @ the **CCC Open Data Portal**

In part, the mission of the Cannabis Control Commission (Commission) is to honor the will of the voters of Massachusetts by safely, equitably, and effectively implementing and administering the laws enabling access to medical and adult-use marijuana in the Commonwealth. Our mission is guided by operating principles to conduct all our processes openly and transparently, engage in regular two-way communication with all concerned constituents, and publicly measure our performance to effectuate a world-class agency.

Our Open Data Platform will support our mission and operating principles by allowing the Commission to measure its effectiveness at regulating the adult-use industry and Medical Use of Marijuana Program, ensuring public health and safety, implementing our equity provisions, and promoting full participation by small and large businesses.

#### References & Documentation:

- [SODA Developers](https://dev.socrata.com/)
- [Example](https://dev.socrata.com/foundry/opendata.mass-cannabis-control.com/hmwt-yiqy)

-------

## ETL Process

### Modules Needed

**Check requirements.txt for more info** 

In [1]:
# handle enviornment variables 
from dotenv import load_dotenv
import os

# Data wrangling and manipulation
import pandas as pd
import numpy as np

# API 
from sodapy import Socrata

# database engine 
from sqlalchemy import create_engine

In [2]:
# load all enviornment variables 

load_dotenv()
API_KEY = os.getenv('API_KEY')
USERNAME = os.getenv('USRNM')
PASSWORD = os.getenv('PASSWORD')
DBPASS = os.getenv('DBPASS')
DBUSER = os.getenv('DBUSER')
DATABASE = os.getenv('DATABASE')

### Web Scraping CCC Data Catalog

In [3]:
# import helper function 
from helper_functions import get_endpoints

In [4]:
api_links = get_endpoints('https://opendata.mass-cannabis-control.com/browse')

---

### Data Extraction Process

In [7]:
# authenticated client (needed for non-public datasets):
client = Socrata("opendata.mass-cannabis-control.com",
                  API_KEY,
                   username=USERNAME,
                   password=PASSWORD)

# Pull data via api enpoint 
# Set limit 6000
results = client.get(api_links[17], limit=6000)  

# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results)

In [8]:
results_df.shape

(3214, 24)

In [9]:
results_df.head()

Unnamed: 0,license_number,application_number,application_status,approved_license_type,business_name,license_type,establishment_address_1,establishment_city,establishment_state,establishment_zip_code,...,first_name_pwa,last_name_pwa,race_ethnicity_pwa,gender_pwa,percentage_ownership_pwa,percentage_control_pwa,version,middle_name_pwa,other_role_pwa,dba_name
0,MP282046,MPN282046,APPROVED,PROVISIONAL CONSIDERATION,Debilitating Medical Condition Treatment Centers,Marijuana Product Manufacturer,578-582 Meadow Street Extension,Agawam,MA,1001,...,David,Goldblum,Decline to Answer,Male,6.7,25,CURRENT REPORTING PERIOD,,,
1,MP282046,MPN282046,APPROVED,PROVISIONAL CONSIDERATION,Debilitating Medical Condition Treatment Centers,Marijuana Product Manufacturer,578-582 Meadow Street Extension,Agawam,MA,1001,...,Grant,Guelich,Decline to Answer,Male,15.4,0,CURRENT REPORTING PERIOD,,,
2,MP282046,MPN282046,APPROVED,PROVISIONAL CONSIDERATION,Debilitating Medical Condition Treatment Centers,Marijuana Product Manufacturer,578-582 Meadow Street Extension,Agawam,MA,1001,...,Bradley,Joseph,Decline to Answer,Male,13.1,25,CURRENT REPORTING PERIOD,,,
3,MP282046,MPN282046,APPROVED,PROVISIONAL CONSIDERATION,Debilitating Medical Condition Treatment Centers,Marijuana Product Manufacturer,578-582 Meadow Street Extension,Agawam,MA,1001,...,Samuel,Hanmer,Decline to Answer,Male,7.7,25,CURRENT REPORTING PERIOD,,,
4,MP282046,MPN282046,APPROVED,PROVISIONAL CONSIDERATION,Debilitating Medical Condition Treatment Centers,Marijuana Product Manufacturer,578-582 Meadow Street Extension,Agawam,MA,1001,...,Jared,Glanz-berger,Decline to Answer,Male,20.8,25,CURRENT REPORTING PERIOD,,,


---

### Transformation Process

In [None]:
fill = dict({'type': 'Point', 'coordinates': [0.00, 0.00]})
x = [f for f in results_df['geocoded_column']]



In [None]:
# temp_coords = []

# for i in range(len(results_df)):
#     try:
#         temp_coords.append(list(results_df['geocoded_column'][i].values()))
#     except AttributeError:
#         temp_coords.append("['Point', [0, 0]]")

# df_items = []

# for i in range(len(temp_coords)):
#     try:
#         df_items.append(temp_coords[i][1])
#     except AttributeError:
#         df_items.append("[0,0]")

In [None]:
geo_col = pd.DataFrame(df_items, columns=['lat', 'long'])

In [None]:
# results_df = results_df.drop(['geocoded_column'],axis=1)
results_df = pd.concat([results_df, geo_col], axis = 1)

In [None]:
results_df = results_df.drop(['geocoded_column'], axis = 1)

In [None]:
results_df

----

### Load Process

In [None]:
from sqlalchemy import create_engine

# create engine to store results 
engine = create_engine(f"mysql+pymysql://{DBUSER}:{DBPASS}@localhost/{DATABASE}")

# Use dataframe to store/push results to sql db 
# if table does exists; it will be replaced by a new extract
# upon every refresh 

results_df.to_sql('approved_licenses', con = engine, if_exists='replace', index=False)

**SQL Alchemy Resources**

* [Docs](https://docs.sqlalchemy.org/en/14/)
* [Overview](https://docs.sqlalchemy.org/en/14/intro.html)
* [Tutorial](https://docs.sqlalchemy.org/en/14/tutorial/index.html)

### Now go have some fun slicing and dicing in your own personal DB!