<a href="https://colab.research.google.com/github/amplabs-ai/amplabs/blob/main/python/analyzing_and_manipulating_data_with_the_pydata_stack%20/example_00/MRS_TRI_AmpLabs_Exercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#MRS - TRI AmpLabs Exercises




In this demo you will learn how to:

1. Upload/Download various types of battery data from **AmpLabs JSON API**
2. Perform simple filtering on battery data

**Pandas** is an easy-to-use data structures and data analysis tools for the Python programming language. In Pandas, we can import data from various file formats like JSON, SQL, Microsoft Excel, etc. When working with tabular data, such as data stored in spreadsheets or databases, pandas will help you to explore, clean, and process your data. In pandas, a data table is called a **Data Frame**.

Note: Pandas is built off of another library called **NumPy**. 


# Key Terms

**API** Application Programming Interface is a set of definitions and protocols for building and integrating application software. AmpLabs provides an API to help you access and control your data.

**JavaScript Object Notation** is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. Win/Win

[JSON](https://www.json.org/json-en.html) is commonly used as an output format from websites. AmpLabs provides a JSON API for your Battery Data. Records for your battery data look like the following: 

``` JSON
{
  "detail": "Records Retrieved", 
  "records": [
    {
      "Charge_Capacity (Ah)": 2.563, 
      "Charge_Energy (Wh)": 10.029, 
      "Cycle_Index": 1, 
      "Discharge_Capacity (Ah)": 2.709, 
      "Discharge_Energy (Wh)": 9.424, 
      "End_Time": null, 
      "Max_Current (A)": 1.496, 
      "Max_Voltage (V)": 4.2, 
      "Min_Current (A)": -1.503, 
      "Min_Voltage (V)": 1.999, 
      "Start_Time": null, 
      "Test_Time (s)": 14644.703
    },
    "status": 200
}
```

**Data Frame**. A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object. Like Series, DataFrame accepts many different kinds of input:
[Dataframe](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html)



``` Python
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
```

In [None]:
import sys
!{sys.executable} -m pip install pandas plotly kaleido

Collecting kaleido
  Downloading kaleido-0.2.1-py2.py3-none-manylinux1_x86_64.whl (79.9 MB)
[K     |████████████████████████████████| 79.9 MB 101 kB/s 
Installing collected packages: kaleido
Successfully installed kaleido-0.2.1


Import the libraries we downloaded into the Python environment

In [None]:
# Useful for fetching data from the web 

import json
import urllib.error
import urllib.request
from urllib.parse import urlencode
import requests
import gzip
import json
from fileinput import filename
import shutil
import time

# PyData Libraries

import pandas as pd

# Define Helper Functions

The following section defines functions that will help us **read** data from and **write** data to the AmpLabs JSON API. 



## Functions to help Upload Data to AmpLabs


In [None]:
# Function that triggers the upload of a Pandas DataFrame to AmpLabs
def upload_df_to_amplabs(user_token, cell_id, df):
    filename = cell_id
    print('Initializing upload...', end = '')
    response, status = init_upload_to_amplabs(user_token, cell_id)
    time.sleep(2)
    if not status:
      print('Failed.')
      return    
    print('Done.')
    print('Starting upload...', end = '')
    keep_col = ['Cycle_Index', 'Test_Time (s)', 'Current (A)', 'Voltage (V)', 'Discharge_Capacity (Ah)', 'Discharge_Energy (Wh)', 'Charge_Capacity (Ah)', 'Charge_Energy (Wh)']
    upload_df = df[keep_col]
    upload_df.sort_values(by=['Cycle_Index', 'Test_Time (s)'])
    upload_df.rename(columns={'Cycle_Index': 'cycle', 'Test_Time (s)': 'test_time', 'Current (A)': 'current', 'Voltage (V)': 'voltage', 'Discharge_Capacity (Ah)':'discharge_capacity', 'Discharge_Energy (Wh)':'discharge_energy', 'Charge_Capacity (Ah)':'charge_capacity', 'Charge_Energy (Wh)':'charge_energy'}, inplace=True)
    upload_df.to_csv(filename, index=['Cycle_Index', 'Test_Time (s)', 'Current (A)', 'Voltage (V)', 'Discharge_Capacity (Ah)', 'Discharge_Energy (Wh)', 'Charge_Capacity (Ah)', 'Charge_Energy (Wh)'])

    with open(filename, 'rb') as src, gzip.open(filename + '.gz', 'wb') as dst:
        dst.writelines(src)

    url = 'https://www.amplabs.ai/upload/cells/generic'
    data = {"cell_id": cell_id}
    file = {'file': open(filename + '.gz', 'rb')}
    headers = {
      'Authorization': 'Bearer {}'.format(user_token)
    }

    try:
        r = requests.post(url, files=file, data=data, headers=headers)
        print('Done.')
        return
    except requests.exceptions.HTTPError as errh:
        print("Http Error:", errh)
    except requests.exceptions.ConnectionError as errc:
        print("Error Connecting:", errc)
    except requests.exceptions.Timeout as errt:
        print("Timeout Error:", errt)
    except requests.exceptions.RequestException as err:
        print("OOps: Something Else", err)
    print('Failed.')

# Function used to notify AmpLbas to prepare for an upload 
def init_upload_to_amplabs(user_token, cell_id):
    url = 'https://www.amplabs.ai/upload/cells/initialize'
    data = {'test_type': 'cycle',
            'file_count': '1',
            'cell_id': cell_id
            }
    httprequest = urllib.request.Request(url, data=urlencode(data).encode('utf-8'), method="POST")
    httprequest.add_header("Authorization", "Bearer {}".format(user_token))
    try:
        with urllib.request.urlopen(httprequest) as httpresponse:
            response = json.loads(httpresponse.read())
            return response, 1
    except urllib.error.HTTPError as e:
        return None, 0 

## Functions to fetch cycle, timeseries, and meta data from AmpLabs


In [None]:
# Function used to fetch cycle data from AmpLabs
def get_amplabs_cycledata(user_token, cell_id):
    url = 'https://www.amplabs.ai/download/cells/cycle_data_json?cell_id={}'.format(cell_id)
    httprequest = urllib.request.Request( url, method="GET")
    httprequest.add_header("Authorization", "Bearer {}".format(user_token))
    try:
        with urllib.request.urlopen(httprequest) as httpresponse:
          response = json.loads(httpresponse.read())
          return response, 1
    except urllib.error.HTTPError as e:
        print(e)
    return None, 0

# Function used to fetch timeseries data from AmpLabs
def get_amplabs_timeseriesdata(user_token, cell_id):
    url = 'https://www.amplabs.ai/download/cells/cycle_timeseries_json?cell_id={}'.format(cell_id)
    httprequest = urllib.request.Request( url, method="GET")
    httprequest.add_header("Authorization", "Bearer {}".format(user_token))
    try:
        with urllib.request.urlopen(httprequest) as httpresponse:
          response = json.loads(httpresponse.read())
          return response, 1
    except urllib.error.HTTPError as e:
        print(e)
    return None, 0

# Function used to fetch timeseries data from AmpLabs
def get_amplabs_meta(user_token, cell_id):
    url = 'https://www.amplabs.ai/cells?cell_id={}'.format(cell_id)
    httprequest = urllib.request.Request( url, method="GET")
    httprequest.add_header("Authorization", "Bearer {}".format(user_token))
    try:
        with urllib.request.urlopen(httprequest) as httpresponse:
          response = json.loads(httpresponse.read())
          return response, 1
    except urllib.error.HTTPError as e:
        print(e)
    return None, 0

**Check your understanding**
1. What is JSON? 

2. Is JSON machine readable?

3. What is a GET Request?

In [None]:
user_token = "<token>"
cell_id = 'sample_cycle'

In [None]:
get_amplabs_meta(user_token, cell_id)

({'detail': 'Records Retrieved',
  'records': [[{'ah': 0,
     'anode': None,
     'cathode': None,
     'cell_id': 'sample_cycle',
     'form_factor': None,
     'index': 252,
     'source': None,
     'test': 'cycle',
     'tester': None}]],
  'status': 200},
 1)

# Fetch data and store it into a dictionary of data frames

In [None]:
response, status = get_amplabs_timeseriesdata(user_token, cell_id)
df = pd.DataFrame(response['records'])

In [None]:
# Filter or make transformations here
filter = df['Current (A)'] >= 1
df[filter]

Unnamed: 0,Cell_Temperature (C),Charge_Capacity (Ah),Charge_Energy (Wh),Current (A),Cycle_Index,Date_Time,Discharge_Capacity (Ah),Discharge_Energy (Wh),Environment_Temperature (C),Test_Time (s),Voltage (V)
61,,0.000028,0.000094,4.076440,2,2022-05-08T19:26:55.074261Z,0.0,0.0,,60.1653,3.382267
62,,0.027460,0.093377,9.900045,2,2022-05-08T19:27:05.075061Z,0.0,0.0,,70.1661,3.402164
63,,0.054897,0.186770,9.900078,2,2022-05-08T19:27:15.075461Z,0.0,0.0,,80.1665,3.405151
64,,0.082337,0.280246,9.899794,2,2022-05-08T19:27:25.076061Z,0.0,0.0,,90.1671,3.407909
65,,0.109773,0.373769,9.900137,2,2022-05-08T19:27:35.075961Z,0.0,0.0,,100.1670,3.410454
...,...,...,...,...,...,...,...,...,...,...,...
31910,,33.010162,116.274391,1.014591,5,2022-05-12T11:05:59.039861Z,0.0,0.0,,315604.1309,3.597595
31913,,33.018482,116.304329,1.022742,5,2022-05-12T11:06:29.041261Z,0.0,0.0,,315634.1323,3.597816
31921,,33.040409,116.383209,1.026530,5,2022-05-12T11:07:49.045461Z,0.0,0.0,,315714.1365,3.597816
31931,,33.067482,116.480606,1.007761,5,2022-05-12T11:09:29.048161Z,0.0,0.0,,315814.1392,3.597706


In [None]:
filter = df['Cycle_Index'] > 1
df = df[filter]

In [None]:
# Uploading filtered file
new_cell_id = "NEW_{}".format(cell_id)
print(df.head())

   Cell_Temperature (C)  Charge_Capacity (Ah)  Charge_Energy (Wh)  \
60                 None              0.000000            0.000000   
61                 None              0.000028            0.000094   
62                 None              0.027460            0.093377   
63                 None              0.054897            0.186770   
64                 None              0.082337            0.280246   

    Current (A)  Cycle_Index                    Date_Time  \
60     0.000000            2  2022-05-08T19:26:54.912761Z   
61     4.076440            2  2022-05-08T19:26:55.074261Z   
62     9.900045            2  2022-05-08T19:27:05.075061Z   
63     9.900078            2  2022-05-08T19:27:15.075461Z   
64     9.899794            2  2022-05-08T19:27:25.076061Z   

    Discharge_Capacity (Ah)  Discharge_Energy (Wh)  \
60                      0.0                    0.0   
61                      0.0                    0.0   
62                      0.0                    0.0   
63

In [None]:
# response, status = get_amplabs_meta(user, new_cell_id)
# try_counter = 0
# while len(response['records'][0]) == 0:
#   try_counter += 1
#   print("Try Upload: ",try_counter)
#   upload_df_to_amplabs(user, new_cell_id, df)
#   time.sleep(2)
#   response, status = get_amplabs_meta(user, new_cell_id)


In [None]:
# print("Successfully uploaded after {} attempt(s)".format(try_counter))
# get_amplabs_meta(user, new_cell_id)


NameError: ignored