# Getting started with Cosmos notebooks
In this notebook, we'll learn how to use Cosmos notebook features. We'll create a database and container, import some sample data in a container in Azure Cosmos DB and run some queries over it.

### Connecting to Azure Cosmos DB

To connect to the service, you can use our built-in instance of ```cosmos_client```. This is a ready to use instance of [CosmosClient]('https://docs.microsoft.com/python/api/azure-cosmos/azure.cosmos.cosmos_client.cosmosclient?view=azure-python') from our Python SDK. It already has the context of this account baked in. 

We'll use ```cosmos_client``` to create a new database called **RetailDemo** and container called **WebsiteData**.

#### Create a new database

In [27]:
import azure.cosmos.errors as errors

# Create a new database if it doesn't already exist
database_id = 'RetailDemo'

try:
    database = cosmos_client.create_database(database_id)
    print('Database with id \'{0}\' created'.format(database_id))

except errors.HTTPFailure as e:
    if e.status_code == 409:
        print('A database with id \'{0}\' already exists'.format(database_id))

# Now we have a reference to the database we can use
database = cosmos_client.get_database_client(database_id)

Database with id 'RetailDemo' created


#### Create a new container
Our dataset will contain events that occurred on the website - e.g. a user viewing an item, adding it to their cart, or purchasing it. We will partition by CartId, which represents the individual cart of each user. This will give us an even distribution of throughput and storage in our container. Learn more about how to [choose a good partition key.](https://docs.microsoft.com/azure/cosmos-db/partition-data)

In [28]:
import azure.cosmos.errors as errors
from azure.cosmos.partition_key import PartitionKey

# Create a new container if it doesn't already exist
container_id = 'WebsiteData'

try:
    container = database.create_container(id=container_id, partition_key=PartitionKey(path="/CartID"))
    print('Container with id \'{0}\' created'.format(container_id))
except errors.HTTPFailure as e:
    if e.status_code == 409:
        print('A container with id \'{0}\' already exists'.format(container_id))

# Now we have a reference to the container we can use
container = database.get_container_client(container_id)

Container with id 'WebsiteData' created


#### Set the default database and container context to the new resources

We can use the ```%database {database_id}``` and ```%container {container_id}``` syntax.

In [29]:
%database RetailDemo

In [30]:
%container WebsiteData

### Create your own custom ```CosmosClient``` instance

For more flexibility, you can create your own instance of ```CosmosClient``` and pass in custom options. Here, we pass in our own [ConnectionPolicy](https://docs.microsoft.com/python/api/azure-cosmos/azure.cosmos.documents.connectionpolicy?view=azure-python]). 

In [31]:
import os
import azure.cosmos.cosmos_client as cosmos
import azure.cosmos.documents as documents

# These should be set to a region you've added for Cosmos DB
region_1 = "Central US" 
region_2 = "East us 2"

custom_connection_policy = documents.ConnectionPolicy()
custom_connection_policy.PreferredLocations = [region_1, region_2] # Set the order of regions the SDK will route requests to. The regions should be regions you've added for Cosmos, otherwise this will error.

# Create a new instance of CosmosClient, getting the endpoint and key from the environment variables
client = cosmos.CosmosClient(os.environ["COSMOS_ENDPOINT"], {'masterKey': os.environ["COSMOS_KEY"]}, connection_policy=custom_connection_policy)

# List all databases 
list(client.read_all_databases())

[{'id': 'RetailDemo',
  '_rid': '20cyAA==',
  '_self': 'dbs/20cyAA==/',
  '_etag': '"00001901-0000-0300-0000-5d7408300000"',
  '_colls': 'colls/',
  '_users': 'users/',
  '_ts': 1567885360},
 {'id': 'iddbtest',
  '_rid': 'LJhBAA==',
  '_self': 'dbs/LJhBAA==/',
  '_etag': '"00000701-0000-0300-0000-5d73f54e0000"',
  '_colls': 'colls/',
  '_users': 'users/',
  '_ts': 1567880526}]

### Load in sample JSON data and insert into the container. 
We'll use the **UpsertItem** operation to create the item if it doesn't exist, or replace it if it already exists. This will take a few minutes.

Here's a sample JSON document.
```
{"CartID":5399,
"Action":"Viewed",
"Item":"Cosmos T-shirt",
"Price":350,
"UserName":"Chadrick.Larkin87",
"Country":"Iceland",
"EventDate":"2015-06-25T00:00:00",
"Year":2015,"Latitude":-66.8673,
"Longitude":-29.8214,
"Address":"852 Modesto Loop, Port Ola, Iceland",
"id":"00ffd39c-7e98-4451-9b91-b2bcf2f9a32d"},
```

In [None]:
# We can install custom packages using pip install
import sys
!{sys.executable} -m pip install progressbar2 --user

In [32]:
## Read data from storage
import urllib.request, json 
with urllib.request.urlopen("https://cosmosnotebooksdata.blob.core.windows.net/notebookdata/websiteData.json") as url:
    data = json.loads(url.read().decode())

for event in progressbar.progressbar(data):
    try: 
        container.upsert_item(body=event)
    except errors.CosmosError as e:
        raise

100% (2654 of 2654) |####################| Elapsed Time: 0:00:57 Time:  0:00:57


The new database and container should show up under the **Data** section. Use the refresh icon after completing the previous cell. 

![RefreshData](https://cosmosnotebooksdata.blob.core.windows.net/notebookdata/refreshData.png)


### Run a query against Azure Cosmos DB, using the SDK.
We'll run the query **SELECT VALUE COUNT(1) FROM c** to count the number of documents in the container.

In [33]:
## Run a query against the container to see number of documents
query = 'SELECT VALUE COUNT(1) FROM c'
result = list(container.query_items(query, enable_cross_partition_query=True))

print('Container with id \'{0}\' contains \'{1}\' items'.format(container_id, result[0]))

Container with id 'WebsiteData' contains '2654' items


### Run some queries against Azure Cosmos DB, using the built-in notebook magic
We'll use the syntax:

```%%sql --database {database_id} --container {container_id} --output outputDataframeVar
{Query text}```

This allows us to output the results of the query directly into a Pandas data frame.


In [34]:
%%sql --database RetailDemo --container WebsiteData --output df_cosmos
SELECT c.Action, c.Price as ItemRevenue, c.Country, c.Item FROM c

In [35]:
# See a sample of the result
df_cosmos.head(10)

Unnamed: 0,Action,ItemRevenue,Country,Item
0,Viewed,9.0,Tunisia,Black Tee
1,Viewed,19.99,Antigua and Barbuda,Flannel Shirt
2,Added,3.75,Guinea-Bissau,Socks
3,Viewed,3.75,Guinea-Bissau,Socks
4,Viewed,55.0,Czech Republic,Rainjacket
5,Viewed,350.0,Iceland,Cosmos T-shirt
6,Added,19.99,Syrian Arab Republic,Button-Up Shirt
7,Viewed,19.99,Syrian Arab Republic,Button-Up Shirt
8,Viewed,33.0,Tuvalu,Red Top
9,Viewed,14.0,Cape Verde,Flip Flop Shoes


We can get more information about the %%sql command using ```%%sql?```

In [36]:
%%sql?

[0;31mDocstring:[0m
::

  %sql [--database DATABASE] [--container CONTAINER] [--output OUTPUT]

Queries Azure Cosmos DB using the given Cosmos database and container.
Learn about the Cosmos query language: https://aka.ms/CosmosQuery

Example:
    %%sql --database databaseName --container containerName
    SELECT top 1 r.id, r._ts from r order by r._ts desc

optional arguments:
  --database DATABASE, -d DATABASE
                        If provided, this Cosmos database will be used;
  --container CONTAINER, -c CONTAINER
                        If provided, this Cosmos container will be used;
  --output OUTPUT       The dataframe of the result will be stored in a
                        variable with this name.
[0;31mFile:[0m      ~/.local/lib/python3.6/site-packages/cosmos_sql/__init__.py


### Next steps

Now that you've learned how to use basic notebook functionality, follow the **Visualization.ipynb** notebook to further analyze and visualize our data. You can find it under the **Sample Notebooks** section.