# Getting started with Cosmos notebooks
In this notebook, we'll learn how to use Cosmos notebook features. We'll create a database and container, import some sample data in a container in Azure Cosmos DB and run some queries over it.

### Connecting to Azure Cosmos DB

To connect to the service, you can use our built-in instance of ```cosmos_client```. This is a ready to use instance of [CosmosClient]('https://docs.microsoft.com/python/api/azure-cosmos/azure.cosmos.cosmos_client.cosmosclient?view=azure-python') from our Python SDK. It already has the context of this account baked in. 

We'll use ```cosmos_client``` to create a new database called **RetailDemo** and container called **WebsiteData**.

#### Create a new database

In [67]:
import azure.cosmos.errors as errors

## Create a new database if it doesn't exist.
database_id = "RetailDemo"
database_link = 'dbs/' + database_id

try:
    cosmos_client.CreateDatabase({"id": database_id})
    print('Database with id \'{0}\' created'.format(database_id))

except errors.HTTPFailure as e:
    if e.status_code == 409:
       print('A database with name \'{0}\' already exists'.format(database_id))
    else: 
        raise

Database with id 'RetailDemo' created


#### Create a new container
Our dataset will contain events that occurred on the website - e.g. a user viewing an item, adding it to their cart, or purchasing it. We will partition by CartId, which represents the individual cart of each user. This will give us an even distribution of throughput and storage in our container. Learn more about how to [choose a good partition key.](https://docs.microsoft.com/azure/cosmos-db/partition-data)

In [68]:
## Create a new container if it doesn't already exist
container_id = "WebsiteData"
container_link = database_link + '/colls/' + container_id
try:
    container_definition = {
        "id": container_id,
        "partitionKey": {
            "paths": [
              "/CartId"
            ]
        }
    }

    container = client.CreateContainer(database_link, container_definition)
    print('Container with id \'{0}\' created'.format(container_id))

except errors.CosmosError as e:
    if e.status_code == 409:
       print('A container with id \'{0}\' already exists'.format(container_id))
    else: 
        raise

Container with id 'WebsiteData' created


#### Set the default database and container context to the new resources

We can use the ```%database {database_id}``` and ```%container {container_id}``` syntax.

In [69]:
%database RetailDemo

In [70]:
%container WebsiteData

### Create your own custom ```CosmosClient``` instance

For more flexibility, you can create your own instance of ```CosmosClient``` and pass in custom options. Here, we pass in our own [ConnectionPolicy](https://docs.microsoft.com/python/api/azure-cosmos/azure.cosmos.documents.connectionpolicy?view=azure-python]). 

In [71]:
import os
import azure.cosmos.cosmos_client as cosmos
import azure.cosmos.documents as documents

connectionPolicy = documents.ConnectionPolicy()
connectionPolicy.PreferredLocations = ["West US 2", "East US 2"] # Set the order of regions the SDK will route requests to

## Create a new instance of CosmosClient, getting the endpoint and key from the environment variables
client = cosmos.CosmosClient(os.environ["COSMOS_ENDPOINT"], {'masterKey': os.environ["COSMOS_KEY"]})

database_id = "RetailDemo"
database_link = 'dbs/' + database_id

client.ReadDatabase(database_link)

{'id': 'RetailDemo',
 '_rid': 'zlgUAA==',
 '_self': 'dbs/zlgUAA==/',
 '_etag': '"0000f000-0000-0400-0000-5d6584090000"',
 '_colls': 'colls/',
 '_users': 'users/',
 '_ts': 1566934025}

### Load in sample JSON data and insert into the container. 
We'll use the **UpsertItem** operation to create the item if it doesn't exist, or replace it if it already exists. This will take a few minutes.

Here's a sample JSON document.
```
{"CartID":5399,
"Action":"Viewed",
"Item":"Cosmos T-shirt",
"Price":350,
"UserName":"Chadrick.Larkin87",
"Country":"Iceland",
"EventDate":"2015-06-25T00:00:00",
"Year":2015,"Latitude":-66.8673,
"Longitude":-29.8214,
"Address":"852 Modesto Loop, Port Ola, Iceland",
"id":"00ffd39c-7e98-4451-9b91-b2bcf2f9a32d"},
```

In [72]:
## Read data from storage
import urllib.request, json 
with urllib.request.urlopen("https://cosmosnotebooksdata.blob.core.windows.net/notebookdata/websiteData.json") as url:
    data = json.loads(url.read().decode())

for event in data:
    try: 
        test = client.UpsertItem(container_link, event)
    except errors.CosmosError as e:
        raise

### Run a query against Azure Cosmos DB, using **cosmos_client**.
We'll run the query **SELECT VALUE COUNT(1) FROM c** to count the number of documents in the container.

In [73]:
## Run a query against the container to see number of documents
query = {'query': 'SELECT VALUE COUNT(1) FROM c'}

options = {}
options['enableCrossPartitionQuery'] = True

result_iterable = cosmos_client.QueryItems(container_link, query, options)
for item in iter(result_iterable):
    print('Container with id \'{0}\' contains \'{1}\' items'.format(container_id, item))

Container with id 'WebsiteData' contains '2654' items


### Run some queries against Azure Cosmos DB, using the built-in notebook magic
We'll use the syntax:

```%%sql --database {database_id} --container {container_id} --output outputDataframeVar
{Query text}```

This allows us to output the results of the query directly into a Pandas data frame.


In [74]:
%%sql --database RetailDemo --container WebsiteData --output df_cosmos
SELECT c.Action, c.Price as ItemRevenue, c.Country, c.Item FROM c

In [75]:
# See a sample of the result
df_cosmos.head(10)

Unnamed: 0,Action,Country,Item,ItemRevenue
0,Viewed,Tunisia,Black Tee,9.0
1,Viewed,Antigua and Barbuda,Flannel Shirt,19.99
2,Added,Guinea-Bissau,Socks,3.75
3,Viewed,Guinea-Bissau,Socks,3.75
4,Viewed,Czech Republic,Rainjacket,55.0
5,Viewed,Iceland,Cosmos T-shirt,350.0
6,Added,Syrian Arab Republic,Button-Up Shirt,19.99
7,Viewed,Syrian Arab Republic,Button-Up Shirt,19.99
8,Viewed,Tuvalu,Red Top,33.0
9,Viewed,Cape Verde,Flip Flop Shoes,14.0


We can get more information about the %%sql command using ```%%sql?```

In [76]:
%%sql?

[0;31mDocstring:[0m
::

  %sql [--database DATABASE] [--container CONTAINER] [--output OUTPUT]

Queries Azure Cosmos DB using the given Cosmos database and container.
Learn about the Cosmos query language: https://aka.ms/CosmosQuery

Example:
    %%sql --database databaseName --container containerName
    SELECT top 1 r.id, r._ts from r order by r._ts desc

optional arguments:
  --database DATABASE, -d DATABASE
                        If provided, this Cosmos database will be used;
  --container CONTAINER, -c CONTAINER
                        If provided, this Cosmos container will be used;
  --output OUTPUT       The dataframe of the result will be stored in a
                        variable with this name.
[0;31mFile:[0m      /usr/local/lib/python3.6/dist-packages/cosmos_sql/__init__.py


### Next steps

Now that you've learned how to use basic notebook functionality, follow the **Visualization.ipynb** notebook to further analyze and visualize our data. You can find it under the **Sample Notebooks** section.