# Basic ETL with Pandas, Azure Cosmos DB and GitHub Codespaces
Convert a filtered CSV File into JSON, then insert into Azure Cosmos DB in minutes with GitHub Codespaces

1. [Create a Cosmos DB NoSQL Account, you can stop after creating the resource](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/quickstart-portal)

2. After account is created in the Azure Portal, navigate to the resource (you can find it in your notifications)

3. To the right of the resource overview select `Keys`. locate the `URI`, and `PRIMARY KEY` secrets.

    ![Azure cosmos db secrets](img/azcosmosdb_secrets.png)
 
4. Set copied secrets it as secrets in your [Codespaces settings here](https://github.com/settings/codespaces). 
    **`URI` should be the `COSMOS_ENDPOINT` secret and `PRIMARY KEY` should be the  `COSMOS_KEY` secret**
     ![codespaces secrets](img/codespaces_secret_settings.png)

5. Run this Notebook

6. **[Clean up your Cosmos DB Account Resources after you're done!](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/quickstart-portal#clean-up-resources)**
 

In [4]:
# Read the parquet file into Pandas data frame
import pandas as pd
import os
import json
import uuid


filename = 'airports.csv'
print('Reading the csv file into Pandas data frame')
df = pd.read_csv(filename)

#Filter down to
us_airports = df.query('country == "United States"')

us_airports

Reading the csv file into Pandas data frame


Unnamed: 0,airport_id,name,city,country,iata,icao,latitude,longitude,altitude,timezone,dst,timezone_name,type,source
3212,3411,Barter Island LRRS Airport,Barter Island,United States,BTI,PABA,70.134003,-143.582001,2,-9.0,A,America/Anchorage,airport,OurAirports
3213,3412,Wainwright Air Station,Fort Wainwright,United States,,PAWT,70.613403,-159.860001,35,-9.0,A,America/Anchorage,airport,OurAirports
3214,3413,Cape Lisburne LRRS Airport,Cape Lisburne,United States,LUR,PALU,68.875099,-166.110001,16,-9.0,A,America/Anchorage,airport,OurAirports
3215,3414,Point Lay LRRS Airport,Point Lay,United States,PIZ,PPIZ,69.732903,-163.005005,22,-9.0,A,America/Anchorage,airport,OurAirports
3216,3415,Hilo International Airport,Hilo,United States,ITO,PHTO,19.721399,-155.048004,38,-10.0,N,Pacific/Honolulu,airport,OurAirports
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7637,13717,Camp Pendleton MCAS (Munn Field) Airport,Oceanside,United States,,KNFG,33.301300,-117.355003,78,,,,airport,OurAirports
7651,13757,Vidalia Regional Airport,Vidalia,United States,VDI,KVDI,32.192699,-82.371201,275,-4.0,A,,airport,OurAirports
7652,13758,Granbury Regional Airport,Granbury,United States,,KGDJ,32.444401,-97.816902,778,-5.0,A,,airport,OurAirports
7653,13759,Oswego County Airport,Fulton,United States,,KFZY,43.350800,-76.388100,475,-4.0,A,,airport,OurAirports


In [None]:
#Transform to json
us_airports_json = us_airports.sample(n=3).to_json(orient= 'records') 
us_airports_json

data = json.loads(us_airports_json)
data

In [None]:

from azure.cosmos import CosmosClient, PartitionKey

ENDPOINT =  os.environ["COSMOS_ENDPOINT"]
KEY = os.environ["COSMOS_KEY"]

DATABASE_NAME = "demo"
CONTAINER_NAME = "airports"

client = CosmosClient(url=ENDPOINT, credential=KEY)

database = client.create_database_if_not_exists(id=DATABASE_NAME)
print("Database\t", database.id)

key_path = PartitionKey(path="/airport_id")

container = database.create_container_if_not_exists(
    id=CONTAINER_NAME, partition_key=key_path, offer_throughput=400
)
print("Container\t", container.id)



for airport in data:
    data[0]['id'] = str(uuid.uuid4())
    container.create_item(data[0])

print('Data has been imported')
