![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fdata-science-and-artificial-intelligence&branch=main&subPath=07-primary-data.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Primary Data

So far we have been using data collected by others (**secondary data**), but we can also use data that we collect ourselves (**primary data**). 

Three ways to do this:
1. Google Sheets
1. EtherCalc
1. Other Online Files


## 1. Google Sheets

One way is read data from a Google Sheet, but you first need to make the Sheet public:

1. Click the `Share` button
1. Under `General access` click the Down arrow.
1. Choose `Anyone with the link`.
1. To the right of that, select `Viewer`.
1. Click `Copy link`.
1. Paste the link into the code cell below between the `'` marks in the first line of the code cell below.

**remove the current url and replace it with your own link**

In [1]:
sheet_link = 'https://docs.google.com/spreadsheets/d/11mulzUH3_cueq-V9D5KIlo9oHE9YYZrUSeVyCin7_rM/edit#gid=176703676'

import pandas as pd
data = pd.read_csv(sheet_link.replace('/edit#gid=', '/export?format=csv&gid='))
data

Unnamed: 0,geo,name,time,Life expectancy
0,afg,Afghanistan,1800,28.21
1,afg,Afghanistan,1801,28.20
2,afg,Afghanistan,1802,28.19
3,afg,Afghanistan,1803,28.18
4,afg,Afghanistan,1804,28.17
...,...,...,...,...
56125,zwe,Zimbabwe,2096,75.12
56126,zwe,Zimbabwe,2097,75.25
56127,zwe,Zimbabwe,2098,75.38
56128,zwe,Zimbabwe,2099,75.52


## 2. EtherCalc

You can also use [EtherCalc](https://ethercalc.net/) to create a public spreadsheet that we can read data from.

`Run` the following code cell to create an EtherCalc sheet and a link to it. You can change `my_callysto_data` to whatever you'd like your spreadsheet to be named.

In [1]:
spreadsheet_name = 'my_callysto_data'

print('Your spreadsheet is available at https://ethercalc.net/' + spreadsheet_name)

Your spreadsheet is available at https://ethercalc.net/my_callysto_data


After you have added data to that spreadsheet, you load it into a dataframe using the following code cell.

In [2]:
import pandas as pd
xlsx_url = 'https://ethercalc.net/' + spreadsheet_name + '.xlsx'
data = pd.read_excel(xlsx_url)
data

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5
0,2023-03-25 21:35:18,1,2,4,8,16
1,2023-03-25 21:35:46,1,2,4,8,16
2,2023-03-25 21:35:57,1,2,4,8,16
3,2023-03-25 21:36:06,1,2,4,8,16
4,2023-03-25 21:36:12,1,2,4,8,16
5,2023-03-25 21:36:54,1,2,4,8,16
6,2023-03-25 21:37:29,1,2,4,8,16
7,2023-03-25 21:37:34,1,2,4,8,16
8,2023-03-25 21:38:21,1,2,4,8,16
9,2023-03-25 21:38:34,1,2,4,8,16


It's even possible to write to an EtherCalc sheet using Python code.

In [3]:
observations = [1, 2, 4, 8, 16]

import requests
from datetime import datetime
base_url = 'https://ethercalc.net/'
post_url = base_url + '_/' + spreadsheet_name
print('posting to', base_url + spreadsheet_name)
date_and_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
data_string = date_and_time+','
for observation in observations:
    data_string += str(observation)+','
r = requests.post(post_url, data = data_string)
print('successful:', r.json()['command'][1])

posting to https://ethercalc.net/my_callysto_data
successful: paste A15 all


# 3. Other Online Sharing Sites

_3a Upload file from GitHub_

Another option is to save the data in a [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) or [Excel file](https://en.wikipedia.org/wiki/Microsoft_Excel#File_formats) format. Then upload it to an online file sharing site such as [GitHub](https://github.com).

A file can be loaded from the link. If you are using GitHub, make sure you copy the [raw link](https://docs.github.com/en/repositories/working-with-files/using-files/viewing-a-file#viewing-or-copying-the-raw-file-content).

Change `read_csv` to `read_excel` if it is an Excel file.

In [8]:
import pandas as pd
data = pd.read_csv('https://raw.githubusercontent.com/callysto/data-files/main/data-science-and-artificial-intelligence/pets.csv')

_3b Upload file from Callysto Hub_

If we don't want or need the data to be publicly accessible on the internet, a file can be uploaded to [this folder on the Callysto Hub](.) and then loaded as a local file.

1. Open [this folder on the Callysto Hub](.)
1. Click the `Upload` button at the top right.
1. A file picker window will open, select the file from your computer and click the `Open` button.
1. Click the blue `Upload` button that appears next to the file.
1. Change `my_data_file.csv` in the code below to the name of your file, and change `read_csv` to `read_excel` if it is an Excel file.

In [5]:
import pandas as pd
data = pd.read_csv('my_data_file.csv')

FileNotFoundError: [Errno 2] No such file or directory: 'my_data_file.csv'

---

<span style="color:#663399">Your **assignment** is to create a document with least one visualization of your own data, and write about what you notice.</span>

---

The [next notebook](08-natural-language-processing.ipynb) will introduce artificial intelligence and natural language processing.

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)