# read_data.ipynb

This notebook reads 3 tables from the Google Sheet specified by **SHEET_ID**.

The tree tables, **Beetles**, **Observations** and **Mass** are read into panasa dataframes and stored locally in **beetles.csv**, **observations.csv**, and **mass.csc**. These **csv** files should be used for further analysis.

### Data Dictionary

**Beetles table**
* ID: beetle ID (unique integer)
* Group: flight test group (A, B, C, D)
* Sex: M, F
* date_treated
* date_dead
* OrNV detected: t, f
* Notes: if this field contains "missing at end of expt", the record for beetle is excluded from **beetles.csv**

**Observations table**

This table is not in standard format. It is a matrix with 'ID' as the first column.
The remaining columns labeled '2023/02/28', '2023/03/01', ... contain observation codes for each beetle.
The dates indicate when the observation was made. 
For example, an F in column 2023/02/28 indicates that this beetle flew during the previous night ( 2023-02-27 19:00 to 2023-02-28 07:00. 

* F: beetle flew (collected from bottom of chamber)
* N: beetle did not fly (collected from paint bucket)
* D: dead
* M: missing

**Mass table**


In [17]:
import pandas as pd

# GET DATA

In [18]:
SHEET_ID = '1jwgm7h_-Al4MspsfC4sP6E03QrjpZcTPr2JC7WLU2QM'

In [19]:
def get_google_sheet(sheet_id, sheet_name):
    """
    Returns a data frame generated from a Google sheet
    """
    url = f'https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}'
    return pd.read_csv(url)

# SHEET_ID = '1jwgm7h_-Al4MspsfC4sP6E03QrjpZcTPr2JC7WLU2QM'
# get_google_sheet(SHEET_ID, 'Beetles')

In [21]:
# get 'beetles' table

df_beetles = get_google_sheet(SHEET_ID, 'Beetles')    
# remove columns after 'Notes'
df_beetles = df_beetles.loc[:,:'Notes']
# Convert dates from string to datetime
df_beetles['date treated'] = pd.to_datetime(df_beetles['date treated'], format='%Y/%m/%d')
df_beetles['date dead'] = pd.to_datetime(df_beetles['date dead'], format='%Y/%m/%d')
# Remove records for beetles missing at end of experiment
df_beetles[~df_beetles.Notes.str.contains('missing at end of expt', na=False)]
# save to disk
df_beetles.to_csv('beetles.csv', index=False)
print('Beetles sheet downloaded and saved to beetles.csv')
df_beetles

Beetles sheet downloaded and saved to beetles.csv


Unnamed: 0,ID,Group,Sex,date treated,date dead,OrNV detected,Notes
0,1,A,F,2023-03-06,2023-04-10,,
1,2,A,M,2023-03-06,2023-03-27,,
2,3,A,F,2023-03-06,2023-04-17,,
3,4,A,M,2023-03-06,2023-03-26,,
4,5,A,F,2023-03-06,NaT,,alive at end of expt
...,...,...,...,...,...,...,...
110,111,D,M,2023-03-06,NaT,,missing at end of expt
111,112,D,M,2023-03-06,2023-03-27,,
112,113,D,F,2023-03-06,2023-03-26,,
113,114,D,M,2023-03-06,2023-04-12,,


In [22]:
# get observations table

df_observations = get_google_sheet(SHEET_ID, 'Observations')
# No idea why the first column is unnamed, but this fixes the problem
df_observations.rename(columns={'Unnamed: 0':'ID'}, inplace=True)
# Drop all columns which do not contain data
df_observations.dropna(axis='columns', how='all', inplace=True)
# Convert the table into a more standard format
df_observations = df_observations.melt(id_vars='ID')
# Rename
df_observations.rename(columns={'variable':'date', 'value':'obs'}, inplace=True)
# Convert date from string to datetime
df_observations['date'] = pd.to_datetime(df_observations['date'], format='%Y/%m/%d')    
df_observations.to_csv('observations.csv', index=False)
print('Observations sheet downloaded and saved to observations.csv')
df_observations

Observations sheet downloaded and saved to observations.csv


Unnamed: 0,ID,date,obs
0,1,2023-02-28,N
1,2,2023-02-28,N
2,3,2023-02-28,N
3,4,2023-02-28,N
4,5,2023-02-28,N
...,...,...,...
5630,111,2023-04-17,N
5631,112,2023-04-17,
5632,113,2023-04-17,
5633,114,2023-04-17,


In [30]:
# get mass table

df_mass = get_google_sheet(SHEET_ID, 'Mass')
df_mass = pd.melt(df_mass, id_vars=['Unnamed: 0'], var_name='date', value_name='milligrams')
df_mass.rename({'Unnamed: 0':'ID'}, axis='columns', inplace=True)
df_mass.dropna(inplace=True) 
# Convert date from string to datetime
df_mass['date'] = pd.to_datetime(df_mass['date'], format='%Y/%m/%d')    
# save to disk
df_mass.to_csv('mass.csv', index=False)
print('Mass sheet downloaded and saved to mass.csv')
df_mass

Mass sheet downloaded and saved to mass.csv


Unnamed: 0,ID,date,milligrams
0,1,2023-03-13,2480.4
1,2,2023-03-13,2879.3
2,3,2023-03-13,5122.6
3,4,2023-03-13,3811.9
4,5,2023-03-13,5063.8
...,...,...,...
895,5,2023-04-17,3410.0
929,42,2023-04-17,3897.0
940,55,2023-04-17,4267.0
943,60,2023-04-17,3036.0


In [25]:
df_beetles.ID.nunique()

115

In [27]:
df_observations.ID.nunique()

115

In [28]:
df_mass.ID.nunique()

98