# Part 4b: Data Science

In the final part of the workshop we break out into two workstreams. This workstream continues down a more technical path, where we help you to perform advanced analytics and/or model development!

Keep your notebook from part 3 close at hand, and use content from the developer workshop as reference material if you really want to leverage the ins and outs of CDP in your analysis.

We can't wait to see what you come up with! Good luck :)
<hr>

# Step 0: Environment setup

In [None]:
# if you're working in google colab or similar
!pip install -q cognite-sdk

In [None]:
%matplotlib inline

import os
from datetime import datetime, timedelta
from datetime import datetime
from getpass import getpass

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression

from cognite import CogniteClient

pd.set_option('display.max_rows', 10)

client = CogniteClient(api_key=getpass("Open Industrial Data API-KEY: "))

# Step 1: Pull up the sub-assets and timeseries for your asset

In [None]:
asset_id = #53231887945301

In [None]:
df_asset_children = client.assets.get_asset_subtree(
    asset_id=asset_id,
    depth=10
).to_pandas().sort_values('depth')
df_asset_children[['depth', 'id', 'parentId', 'description']]

In [None]:
df_asset_children_timeseries = client.time_series.get_time_series(path=str([asset_id])).to_pandas()
df_asset_children_timeseries

# Step 2: Explore

Part 4b is an open ended investigation into Industrial data. Cogniters will be around to guide you along, and help you turn your industrial ideas into quality analysis. Here are a few projects to get you started:

#### Data quality investigation
Sometimes the edge goes down. We need to know how complete our datasets are. Play with `client.datapoints.get_datapoints_frame` to get years worth of day resolution data and come up with a good way of visualizing gaps in critical sensors. Can you spot times when the platform was shut down?

#### Graph analytics
Use the `networkx` python package (`networkx.from_pandas_dataframe`) to draw some cool visualizations of the asset hierarchy.

#### Supervised sensor prediction
Use scikit-learn regressors to build a model that uses other sensors to predict the value of one (important) sensor. Think about how this could be extended towards anomaly detection, and the potential pitfalls! A good understanding of the process will help!

#### Unsupervised anomaly detection
Build an unsupervised anomaly detection model (Isolation Forest, k-means, pca, to name a few). Conceive an anomaly detection score.


## Tips
- A concise, coherent analysis is more useful than a lengthy notebook that does many shiny things but doesn't reach any conclusions
- Ask questions about the assets or read up on them ([ABB's handbook on Oil & Gas](https://library.e.abb.com/public/34d5b70e18f7d6c8c1257be500438ac3/Oil%20and%20gas%20production%20handbook%20ed3x0_web.pdf)).
- Work together and share your results
- Send us your notebook by opening a pull request to the [contributions folder](https://github.com/cognitedata/open-industrial-data/tree/master/contributions)! A well structured, clear analysis on your github profile and shared with the Open Industrial Data community could be your foot in the door to your next dream job in Data Science!