# Quality Analysis of Cell Data

In [None]:
!pip install hana_ml
!pip install hdfs

In [2]:
# Import of HANA Connections
# Enables to create a pandas DataFrame out of HANA table selections
# Details: https://help.sap.com/doc/1d0ebfe5e8dd44d09606814d83308d4b/2.0.04/en-US/hana_ml.html
import hana_ml
import hana_ml.dataframe as dataframe
from notebook_hana_connector.notebook_hana_connector import NotebookConnectionContext
import hdfs

# Usual packages for data science
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

## Connection to Data Source
 * Open the connection to the HANA DB using the credentials stored in the Connection Management.
 * Read the table into DataFrame df
 * Display the DataFrame

In [None]:
# FOR HANA ML
conn = NotebookConnectionContext(connectionId = 'HANA_CLOUD_TECHED')
df = conn.table('CELLSTATUS', schema='TECHED').collect()
display(df)

## Configuration Setting and Performance over Time

Creating 2 charts for the values of "KEY1" and "KEY2" over time. Comparing measured performance values against configuration setting. 

In [None]:
fig = plt.figure(figsize=(18, 5))
ax1 = fig.add_subplot(1, 2,1)
ax2 = fig.add_subplot(1, 2,2)

fig.suptitle('CELLSTATUS',y = 0.99)

ax1.plot(df['DATE'],df['NOM_KEY1'],color='red')
ax1.plot(df['DATE'],df['KEY1'])
ax1.legend(['Config Setting', 'Measurement'])
ax1.xaxis.set_major_locator(mdates.MonthLocator())
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%B'))
ax1.set_title('KEY1')


ax2.plot(df['DATE'],df['NOM_KEY2'],color='red')
ax2.plot(df['DATE'],df['KEY2'])
ax2.legend(['Config Setting', 'Measurement'])
ax2.xaxis.set_major_locator(mdates.MonthLocator())
ax2.xaxis.set_major_formatter(mdates.DateFormatter('%B'))
ax2.set_title('KEY2')

## Histogram of KEY1 and KEY2
Calculation the value distribution of the values of "KEY1" and "KEY2". 

In [None]:
fig, ax = plt.subplots(figsize=(18, 5))
ax.hist(df['KEY1'],50, facecolor='green', alpha=0.75)
ax.hist(df['KEY2'],50, facecolor='blue', alpha=0.75)

## Statistic Description
Assumption of a normal distribution. 3-sigma-score: 99.73% should be within +-3*std from mean value. 

1. Calculate mean value and standard deviation
2. Compute the number of values outside of the 3-sigma area compared to the expected outcome

### KEY1 for all Cells

### KEY2 for all Cells

###  For Each Cell of KEY2
Deviation of actual number of values outside of 3-sigma boundaries compared to the expected one for each cell. 

### Detailed look on the outliers
For cells where the expected values deviate check the time dependency. 

## Access Data on DI Data Lake