In [None]:
# We need to import some modules before we start
from datascience import Table

import matplotlib # the lines involving matplotlib allow for plotting
matplotlib.use('Agg')
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('fivethirtyeight') # 538 is a style in matplotlib, 
                                 # see https://tonysyu.github.io/raw_content/matplotlib-style-gallery/gallery.html for some other styles



### Demography 180 - Social Networks
### Lab 1 -- part 1

Welcome to the first lab for Demography 180 - Social Networks!

The first part of lab 1 is designed as a reference to help you learn about `table` in the datascience module. If you don't have much experience with using this module, it's a good reference for practice and re-visit when you work on part2 and hw02.

You don't have to submit part 1. But remember to submit lab 1 part 2 by the deadline!

# 0 The datascience module

We usually analyze data in the format of tables. The `table` in datascience module provides a very helpful tool for this lab and the future works we do in this class. This section aims to introduce you to some basic elements and syntax for using `table`.

## 0.1 Creating the tables
A Table is a sequence of labeled columns of data.

A Table can be constructed from scratch by extending an empty table with columns.

In [None]:
# Construct a table from strach and assign it to a name 't'
t = Table().with_columns([
    'letter', ['a','b','c','z'], # contents in quotes '' are string texts. Words without quotes refer to variable names or function names.
    'count', [9, 3, 3, 1],
    'points', [1, 2, 2, 10],
])

In [None]:
# Let's see what you constructed
print(t)

More usually, we load the dataset that we want to analyze via a path to the file.

In [None]:
Table.read_table('ucb_fa2022_personal_networks_clean.csv') # pass in the location of the data and its file name

Let's work with the first 10 rows of this data as an example to learn basic syntax.

In [None]:
# Assign this table to a name called survey
### path needs to be updated for this year's data
survey = Table.read_table('ucb_fa2022_personal_networks_clean.csv')

# Then choose the first 10 rows as an example
example = survey.take[:10]

# Take a look at this dataset called example
example

## 0.2 Accessing values tables


To access values of columns in the table, use `column()`, which takes a column label or index and returns an array. Alternatively, `columns()` returns a list of columns (arrays).

Let's use `example` data.

In [None]:
example.column('respondent_gender') # This line returns the description of the column called "respondent_gender". You use quotes because you are refering to a name

You can also refer to the "respondent_gender" column by index.
** Indexing starts from 0. The "interview_number" column, which is the 1st column, is column(0).

In [None]:
example.column(3) # indexing takes numbers, don't put numbers in quotes!

In [None]:
example.column(0) 

In [None]:
example.column(1)

You can use square brackets to do the same thing as the .column function as a short hand.

In [None]:
example['interview_number']

In [None]:
example[0]

To access values by row, `row()` returns a row by index. You can use `row('name')` or `row[index number]` to refer to the row you want to access.

Alternatively, `rows()` returns a list-like **Rows** object that contains tuple like **Row** objects. You can further point to the elements in this list.

In [None]:
example.rows[0]

In [None]:
example.row(0)

In [None]:
first = example.rows[0]
first[0] # this will show the first element in the first row of example, which is example.rows[0]

You can get the number of rows:

In [None]:
example.num_rows

## 0.3 Manipulating data

Adding a column:

In [None]:
example = example.with_column('interviewer_id', [1,1,1,2,2,2,3,3,3,4]) # here we add a column that counts the interviewers
                                                             # You will see this column on the right of the table (scroll to the right)

In [None]:
# NOTE that .with_column returns a new table without modifying the original one. The table example is still the same if we don't reassign it.
example

Let's reduce the table to only contain the information about the respondents for a simpler illustration.

You select columns with `select()`.

In [None]:
example.select(['interview_number','respondent_gender','respondent_age','respondent_class',
               'respondent_home'])

In [None]:
# Let's assign this as a new dataset
example2 = example.select(['interview_number','respondent_gender','respondent_age','respondent_class',
               'respondent_home'])

You can rename columns with `relabeled()`:

In [None]:
example2

In [None]:
example2.relabeled('respondent_home', 'respondent_residency')

In [None]:
# Like .with_column(), renaming doens't change the original example1
example2

You can selecting rows by index with `take()` and conditionally with `where()`:

In [None]:
example2.take(2) # returns the 3RD row

In [None]:
example2.where(example2.column('respondent_gender') == 'Female')

In [None]:
example2.where('respondent_gender', 'Female') # returns the rows where the respondent_gender is "Female"

In [None]:
example2.where(example2['respondent_age'] == 20) # example2['respondent_age'] refers to the values in column 'count', you could only compare values to values
                        # this whole function returns the rows where respondent_age is equal to 20
                        #see below to look at what the inner expression returns
                        #PS this is called a "boolean mask"

In [None]:
# Let's see what example2['respondent_age'] < 20 returns
example2['respondent_age'] == 20 # the results of comparison are an array of booleans (trues or falses)

Further operation on table data with `sort()`, `group()`, and `pivot()`.

In [None]:
example2.sort('respondent_age') # this function sorts the table based on the values in 'respondent_age' column

In [None]:
example2.sort('respondent_gender', descending = True) # You can also sort the first letters of words from z to a.

In [None]:
# You may pass a reducing function into the collect argument
# Note the renaming of the points column because of the collect argument

example2.select(['respondent_home','respondent_age']).group('respondent_home', collect = min) # this line selects the two columns 'respondent_home' and 'respondent_age',
                                                           # and then groups these row based on 'respondent_home'
                                                           # and report the smallest number of the 'respondent_age' in each category.

In [None]:
# You can use the 'pivot' function to do some tabulations.
example2.pivot('respondent_home','respondent_gender')

In [None]:
# We can take a pivot look at the whole dataset
survey.pivot('respondent_home', 'respondent_gender')