## 3. Analyze Data

![list](https://apmonitor.com/che263/uploads/Begin_Python/list.png)

Once data is read into Python, a typical first step is to summarize the data with statistical analysis. This is especially true if the data is inconvenient or too large to view in a spreadsheet like Microsoft Excel (limited to 1,048,576 rows and 16,384 columns). Summary statistics include the count, mean, standard deviation, maximum, minimum, and quartile information for the data columns. Use the `requests` module to download [sample data](https://apmonitor.com/pdc/uploads/Main/tclab_data2.txt) as `03-tclab.csv` to the local directory. 

In [None]:
import requests
url = 'http://apmonitor.com/pdc/uploads/Main/tclab_data2.txt'
r = requests.get(url)
with open('03-tclab.csv', 'wb') as f:
    f.write(r.content)

#### Use `numpy` to analyze data

The `np.loadtxt` function reads the TCLab CSV data file `03-tclab.csv`. The numpy functions `size` (dimensions), `mean` (average), `std` (standard deviation), and `median` are summary statistics. If you don't specify the `axis` then `numpy` gives a statistic across both the rows (`axis=0`) and columns (`axis=1`).

In [None]:
import numpy as np
data = np.loadtxt('03-tclab.csv',delimiter=',',skiprows=1)

print('Dimension (rows=0,columns=1):')
print(np.size(data,0),np.size(data,1))

print('Average:')
print(np.mean(data,axis=0))

print('Standard Deviation:')
print(np.std(data,0))

print('Median:')
print(np.median(data,0))

#### Use `pandas` to analyze data

Pandas simplifies the data analysis with the `.describe()` function that is a method of `DataFrame` that is created with `pd.read_csv()`. Note that the data file can either be a local file name or a web-address such as 

```python
url='https://apmonitor.com/pdc/uploads/Main/tclab_data2.txt'
data = pd.read_csv(url)
```

In [None]:
import pandas as pd
data = pd.read_csv('03-tclab.csv')
data.describe()

### Activity

![expert](https://apmonitor.com/che263/uploads/Begin_Python/expert.png)

Generate a file from the TCLab data with seconds (`t`), heater levels (`Q1` and `Q2`), and temperatures (`lab.T1` and `lab.T2`). Record data every second for 120 seconds and change the heater levels every 30 seconds to a random number between 0 and 80 with `np.random.randint()`. There is no need to change this program, only run it for 2 minutes to collect the data.

In [None]:
import tclab, time, csv
import numpy as np
n = 120 
with open('03-tclab_new.csv',mode='w',newline='') as f:
    cw = csv.writer(f)
    cw.writerow(['Time','Q1','Q2','T1','T2'])
    with tclab.TCLab() as lab:
        print('t Q1 Q2 T1    T2')
        for t in range(n):
            if t%30==0:
                Q1 = np.random.randint(0,81)
                Q2 = np.random.randint(0,81)
                lab.Q1(Q1); lab.Q2(Q2)
            cw.writerow([t,Q1,Q2,lab.T1,lab.T2])
            if t%5==0:
                print(t,Q1,Q2,lab.T1,lab.T2)
            time.sleep(1)

Read the file `03-tclab_new.csv` and display summary statistics with `data.describe()`. If you do not have a TCLab device, read the data file from the web address with `data=pd.read_csv('http://apmonitor.com/pdc/uploads/Main/tclab_data2.txt')`.