<a href="https://colab.research.google.com/github/amplabs-ai/amplabs/blob/main/python/analyzing_and_manipulating_data_with_the_pydata_stack%20/example_00/MRS_TRI_AmpLabs_Exercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#MRS - TRI AmpLabs Exercises




In this demo you will learn how to:

1. Upload/Download various types of battery data from **AmpLabs JSON API**
2. Perform simple filtering on battery data

**Pandas** is an easy-to-use data structures and data analysis tools for the Python programming language. In Pandas, we can import data from various file formats like JSON, SQL, Microsoft Excel, etc. When working with tabular data, such as data stored in spreadsheets or databases, pandas will help you to explore, clean, and process your data. In pandas, a data table is called a **Data Frame**.

Note: Pandas is built off of another library called **NumPy**. 


# Key Terms

**API** Application Programming Interface is a set of definitions and protocols for building and integrating application software. AmpLabs provides an API to help you access and control your data.

**JavaScript Object Notation** is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. Win/Win

[JSON](https://www.json.org/json-en.html) is commonly used as an output format from websites. AmpLabs provides a JSON API for your Battery Data. Records for your battery data look like the following: 

``` JSON
{
    "detail": "Records Retrieved",
    "records": [
        [
            {
                "charge_step_count": null,
                "cycle_charge_capacity": 1.026650590728431,
                "cycle_charge_energy": 3.5721314343688153,
                "cycle_coulombic_efficiency": 1.0020317221497879,
                "cycle_discharge_capacity": 1.0287364594737067,
                "cycle_discharge_energy": 3.129065924887601,
                "cycle_duration": 2529.3717,
                "cycle_end_timestamp": "2022-09-06T17:42:31.223598Z",
                "cycle_energy_efficiency": 0.8759660674245312,
                "cycle_index": 1,
                "cycle_max_charge_power": 23.36383866394476,
                "cycle_max_current": 6.6025085,
                "cycle_max_discharge_power": 15.200998988505711,
                "cycle_max_power": 23.36383866394476,
                "cycle_max_rest_voltage": 3.5980968,
                "cycle_max_voltage": 3.6001949,
                "cycle_mean_charge_power": 9.714367248195945,
                "cycle_mean_charge_voltage": 3.3811619645885287,
                "cycle_mean_discharge_power": -10.992575217325712,
                "cycle_mean_discharge_voltage": 2.793457597077922,
                "cycle_mean_power": 0.703100827021039,
                "cycle_mean_voltage": 3.1228079950344827,
                "cycle_min_charge_power": 0.15608810761640943,
                "cycle_min_current": -4.4012828,
                "cycle_min_discharge_power": 0.109924433457156,
                "cycle_min_power": 0.0,
                "cycle_min_rest_voltage": 2.0303135,
                "cycle_min_voltage": 1.9997959,
                "cycle_resistance_end_of_charge": 0.197672471838036,
                "cycle_resistance_end_of_discharge": -0.1099885647713824,
                "cycle_resistance_start_of_charge": 0.993497447188566,
                "cycle_resistance_start_of_discharge": -1.89296425266147,
                "cycle_start_timestamp": "2022-09-06T17:00:21.851898Z",
                "cycle_time": null,
                "cycle_total_rest_time": 105613.492,
                "cycle_voltage_efficiency": null,
                "datapoint_count": 725,
                "discharge_step_count": null,
                "dt_end_of_charge": 2.1689000000005763,
                "dt_end_of_discharge": 3.8768999999992957,
                "dt_start_of_charge": 0.0,
                "dt_start_of_discharge": 0.16220000000066648,
                "dv_end_of_charge": -0.0001811999999996594,
                "dv_end_of_discharge": 9.14999999999111e-05,
                "dv_start_of_charge": 0.0,
                "dv_start_of_discharge": -0.010403600000000068,
                "rest_step_count": null,
                "step_count": null,
                "test_time": 7527.4484
            },
        ]
    ],
    "status": 200
}
```

**Data Frame**. A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object. Like Series, DataFrame accepts many different kinds of input:
[Dataframe](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html)



``` Python
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
```

In [1]:
import sys
!{sys.executable} -m pip install pandas plotly kaleido requests

Collecting kaleido
  Downloading kaleido-0.2.1-py2.py3-none-manylinux1_x86_64.whl (79.9 MB)
[K     |████████████████████████████████| 79.9 MB 101 kB/s 
Installing collected packages: kaleido
Successfully installed kaleido-0.2.1


Import the libraries we downloaded into the Python environment

In [2]:
# Useful for fetching data from the web 

import json
import requests
import gzip
import json
from fileinput import filename
import shutil
import time

# PyData Libraries

import pandas as pd

# Define Helper Functions

The following section defines functions that will help us **read** data from and **write** data to the AmpLabs JSON API. 



## Functions to help Upload Data to AmpLabs


In [54]:
# Function that triggers the upload of a Pandas DataFrame to AmpLabs
def upload_df_to_amplabs(user_token, cell_id, df):
    filename = cell_id.replace("/","-")
    print('Initializing upload...', end = '')
    response, status = init_upload_to_amplabs(user_token, cell_id)
    time.sleep(5)
    if not status:
      print('Failed.')
      return    
    print('Done.')
    print('Starting upload...', end = '')
    keep_col = ['cycle_index', 'test_time', 'current', 'voltage', 'discharge_capacity', 'discharge_energy', 'charge_capacity', 'charge_energy']
    upload_df = df[keep_col]
    upload_df.sort_values(by=['cycle_index', 'test_time'])
    upload_df.to_csv(f"{filename}.csv")

    with open(f"{filename}.csv", 'rb') as src, gzip.open(filename + '.gz', 'wb') as dst:
        dst.writelines(src)

    url = 'https://www.app.amplabs.ai/upload/cells/generic'
    data = {"cell_id": cell_id}
    file = {'file': open(filename + '.gz', 'rb')}
    headers = {
      'Authorization': 'Bearer {}'.format(user_token)
    }

    try:
        r = requests.post(url, files=file, data=data, headers=headers)
        print('Done.')
        return
    except requests.exceptions.HTTPError as errh:
        print("Http Error:", errh)
    except requests.exceptions.ConnectionError as errc:
        print("Error Connecting:", errc)
    except requests.exceptions.Timeout as errt:
        print("Timeout Error:", errt)
    except requests.exceptions.RequestException as err:
        print("OOps: Something Else", err)
    print('Failed.')

# Function used to notify AmpLbas to prepare for an upload 
def init_upload_to_amplabs(user_token, cell_id):
    url = 'https://www.app.amplabs.ai/upload/cells/initialize'
    data = [{'test_type': 'cycle',
            'cell_id': cell_id,
             'is_public': True
            }]
    headers = {
      "Authorization": "Bearer {}".format(user_token),
      "Content-Type": "application/json"
        }
    try:
        response = requests.request("POST", url, headers=headers, data = json.dumps(data))
        return json.loads(response.text), 1
    except requests.exceptions.RequestException as e:
        print(e)
    return None, 0 

## Functions to fetch cycle, timeseries, and meta data from AmpLabs


In [3]:
# Function used to fetch cycle data from AmpLabs
def get_amplabs_cycledata(user_token, cell_id):
    url = 'https://www.app.amplabs.ai/download/cells/cycle_data_json?cell_id={}'.format(cell_id)
    headers = {
    "Authorization": "Bearer {}".format(user_token),
    }
    try:
        response = requests.request("GET", url, headers=headers)
        res =  json.loads(response.text)
        response = requests.request("GET",res["response_url"])
        response = gzip.decompress(response.content)
        return json.loads(response.decode()), 1
    except requests.exceptions.RequestException as e:
        print(e)
    return None,0

# Function used to fetch timeseries data from AmpLabs
def get_amplabs_timeseriesdata(user_token, cell_id):
    url = 'https://www.app.amplabs.ai/download/cells/cycle_timeseries_json?cell_id={}'.format(cell_id)
    headers = {
    "Authorization": "Bearer {}".format(user_token),
    }
    try:
        response = requests.request("GET", url, headers=headers)
        res =  json.loads(response.text)
        response = requests.request("GET",res["response_url"])
        response = gzip.decompress(response.content)
        return json.loads(response.decode()), 1
    except requests.exceptions.RequestException as e:
        print(e)
    return None,0

# Function used to fetch timeseries data from AmpLabs
def get_amplabs_meta(user_token, cell_id):
    url = 'https://www.app.amplabs.ai/cells/cycle/metaWithId?cell_id={}'.format(cell_id)
    headers = {
    "Authorization": "Bearer {}".format(user_token),
    }
    try:
        response = requests.request("GET", url, headers=headers)
        response = json.loads(response.text)
        return response, 1
    except requests.exceptions.RequestException as e:
        print(e)
    return None,0

**Check your understanding**
1. What is JSON? 

2. Is JSON machine readable?

3. What is a GET Request?

In [4]:
user_token = "<token>"
cell_id = 'sample_cycle'

In [5]:
get_amplabs_meta(user_token, cell_id)

({'detail': 'Records Retrieved',
  'records': [[{'ah': 0,
     'anode': None,
     'cathode': None,
     'cell_id': 'sample_cycle',
     'form_factor': None,
     'index': 252,
     'source': None,
     'test': 'cycle',
     'tester': None}]],
  'status': 200},
 1)

# Fetch data and store it into a dictionary of data frames

In [6]:
response, status = get_amplabs_timeseriesdata(user_token, cell_id)
df = pd.DataFrame(response['records'][0])

In [7]:
# Filter or make transformations here
filter = df['current'] >= 1
df[filter]

Unnamed: 0,capacity_throughput,cell_temperature,charge_capacity,charge_energy,cumulative_charge_capacity,cumulative_charge_energy,cumulative_discharge_capacity,cumulative_discharge_energy,current,cycle_charge_capacity,...,power,step_datapoint_ordinal,step_index,step_time,step_type,test_datapoint_ordinal,test_net_capacity,test_net_enerygy,test_time,voltage
6,0.001496,15.504,0.000,0.000,0.000,0.000,0.000,0.0,1.496,0.000,...,4.989160,,,,Charge,7,7.491370e+01,,60.080,3.335
7,0.279752,15.504,0.000,0.000,0.000,0.000,0.000,0.0,1.496,0.000,...,5.005616,,,,Charge,8,1.501071e+02,,60.267,3.346
8,0.933504,15.565,0.000,0.001,0.000,0.001,0.000,0.0,1.496,0.000,...,5.020576,,,,Charge,9,2.262341e+02,,60.891,3.356
9,2.939640,15.504,0.001,0.003,0.001,0.004,0.000,0.0,1.496,0.001,...,5.035536,,,,Charge,10,3.053007e+02,,62.856,3.366
10,4.248640,15.565,0.002,0.007,0.003,0.011,0.000,0.0,1.496,0.003,...,5.050496,,,,Charge,11,3.886159e+02,,65.696,3.376
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
47685,75.155056,18.263,1.296,5.086,71712.353,277745.190,20123.056,72411.5,1.252,187.172,...,5.257148,,,,Charge,47686,-3.796371e+09,,4565401.125,4.199
47686,71.252049,18.110,1.316,5.171,71713.669,277750.361,20123.056,72411.5,1.187,188.488,...,4.984213,,,,Charge,47687,-3.790952e+09,,4565461.152,4.199
47687,67.531500,18.125,1.335,5.252,71715.004,277755.613,20123.056,72411.5,1.125,189.823,...,4.723875,,,,Charge,47688,-3.785816e+09,,4565521.180,4.199
47688,64.049876,18.034,1.354,5.329,71716.358,277760.942,20123.056,72411.5,1.067,191.177,...,4.480333,,,,Charge,47689,-3.780944e+09,,4565581.208,4.199


In [8]:
filter = df['cycle_index'] > 1
df = df[filter]

In [30]:
# Uploading filtered file
new_cell_id = "NEW_{}".format(cell_id)
print(df.head())

      capacity_throughput  cell_temperature  charge_capacity  charge_energy  \
1266             0.140624            17.636              0.0          0.000   
1267             0.233376            17.636              0.0          0.000   
1268             0.327624            17.636              0.0          0.000   
1269             0.700128            17.636              0.0          0.001   
1270             1.072632            17.559              0.0          0.001   

      cumulative_charge_capacity  cumulative_charge_energy  \
1266                    2432.339                  9428.476   
1267                    2432.339                  9428.476   
1268                    2432.339                  9428.476   
1269                    2432.339                  9428.477   
1270                    2432.339                  9428.478   

      cumulative_discharge_capacity  cumulative_discharge_energy  current  \
1266                       1010.459                      3695.51    1.496  

In [55]:
# response, status = get_amplabs_meta(user_token, new_cell_id)
# upload_df_to_amplabs(user_token, new_cell_id, df)
# time.sleep(60)
# response, status = get_amplabs_meta(user_token, new_cell_id)


Initializing upload...Done.
Starting upload...Done.


In [9]:
get_amplabs_meta(user_token, new_cell_id)


({'detail': 'Records Retrieved',
  'records': [[{'active_mass': None,
     'ah': 0,
     'anode': None,
     'cathode': None,
     'cell_id': 'NEW_SNL_18650_NMC_15C_0-100_0.5/1C_a',
     'form_factor': None,
     'index': 52,
     'source': None,
     'test': 'cycle',
     'tester': None,
     'type': 'private'}]],
  'status': 200},
 1)