# Lecture 1 – Introduction

## Data 6, Summer 2022

This is a Jupyter notebook. We'll write all of our code in this class in a Jupyter notebook.

Today, don't worry about how any of this works. Throughout the summer, we'll learn how each of these pieces work.

**Note: If you're having trouble loading any plots or maps, try using Google Chrome.**

In [97]:
from datascience import *
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import plotly.graph_objects as go

## California universities

Here, we'll load in data about all public universities in California. The data comes from [this Wikipedia article](https://en.wikipedia.org/wiki/List_of_colleges_and_universities_in_California).

In [52]:
uni = Table.read_table('data/california_universities.csv') # Load in the "california_universities.csv" file in the "data" folder

# Remove irregular formatting
uni = uni.with_columns(
    'Enrollment', uni.apply(lambda s: int(s.replace(',', '')), 'Enrollment'),
    'Founded', uni.apply(lambda s: int(s.replace('*', '')), 'Founded')
)

Data is often stored in tables. In about a few weeks, we'll become very, very familiar with how tables work. But for now, let's just observe.

In [None]:
... # Let's see what the table looks like

Let's start asking questions.

### What's the largest public university in California?

In [None]:
... # Largest university - table format

In [None]:
... # Can we visualize the sizes of each university?

### What's the oldest public university in California? 🤔

In [None]:
... # Oldest university - table format

In [None]:
... # How can we visualize the ages of the universities?

Let's add some spice.

In [None]:
# Just run me
fig = go.Figure()

fig.add_trace(
    go.Scatter(x = uni_copy.column('Founded'), 
               y = uni_copy.column('Total Universities'), 
               hovertext = uni_copy.column('Name'),
               mode = 'markers',
              )
)

fig.add_trace(
    go.Scatter(x = uni_copy.column('Founded'), 
               y = uni_copy.column('Total Universities'),
               line = dict(color = 'blue'),
              )
)

fig.update_layout(title = 'Total Number of Public Universities in California by Year',
                  xaxis_title = 'Year',
                  yaxis_title = 'Total Universities')

fig.show()

## Public Universities in California (and you!)

### Where are the public universities in California located?

First, we need some additional information:

In [53]:
uni_locations = Table.read_table('data/uni_locations.csv')
uni_locations

Latitude,Longitude,University
37.8719,-122.259,"University of California, Berkeley"
38.5382,-121.762,"University of California, Davis"
33.6405,-117.844,"University of California, Irvine"
34.0689,-118.445,"University of California, Los Angeles"
37.3661,-120.422,"University of California, Merced"
33.9737,-117.328,"University of California, Riverside"
32.8801,-117.234,"University of California, San Diego"
34.414,-119.849,"University of California, Santa Barbara"
36.9881,-122.058,"University of California, Santa Cruz"
38.0689,-122.23,California State University Maritime Academy


Let combine some data.

In [54]:
unis_with_location = uni.join("Name", uni_locations, "University")
unis_with_location

Name,City,County,Enrollment,Founded,Latitude,Longitude
California Polytechnic State University,San Luis Obispo,San Luis Obispo,21812,1901,35.305,-120.662
"California State Polytechnic University, Pomona",Pomona,Los Angeles,26443,1938,34.0589,-117.819
California State University Channel Islands,Camarillo,Ventura,7095,2002,34.1621,-119.043
California State University Maritime Academy,Vallejo,Solano,1017,1929,38.0689,-122.23
California State University San Marcos,San Marcos,San Diego,14511,1988,33.1295,-117.16
"California State University, Bakersfield",Bakersfield,Kern,10493,1965,35.3487,-119.103
"California State University, Chico",Chico,Butte,17488,1887,39.7298,-121.846
"California State University, Dominguez Hills",Carson,Los Angeles,15741,1960,33.8662,-118.257
"California State University, East Bay",Hayward,Alameda,14525,1959,37.6571,-122.057
"California State University, Fresno",Fresno,Fresno,24995,1911,36.8134,-119.746


In [55]:
unis_with_location.labels

('Name', 'City', 'County', 'Enrollment', 'Founded', 'Latitude', 'Longitude')

What if we want to plot these on a map?

We can use the `plotly` API (essentially a library of additional things we can do with Python)!

In [56]:
unis_with_location.column("Enrollment")

array([21812, 26443,  7095,  1017, 14511, 10493, 17488, 15741, 14525,
       24995, 39774, 36846, 27685,  7079, 38716, 31131, 19973, 10214,
        7774, 34881, 29586, 32828,  9201, 42519, 39152, 35220, 45428,
        8544, 23278, 38798, 24346, 19700])

In [99]:
def bubble_plot(tbl, text, size, lat="Latitude", lon="Longitude", color=None, title=None, scale_factor=150):
    fig = go.Figure()
    
    if not color:
        color_arr = ['royalblue'] * tbl.num_rows
    else:
        color_arr = tbl.column(color)

    fig = fig.add_trace(go.Scattergeo(
                            lat = tbl.column(lat), 
                            lon = tbl.column(lon),
                            text = tbl.column(text),
                            marker = dict(
                                size = tbl.column(size) / scale_factor,
                                sizemode = 'area',
                                color = color_arr
                            )
                        ))

    fig.update_geos(fitbounds="locations")
    fig.update_layout(
        geo = dict(
                scope = 'usa',
                landcolor = 'rgb(217, 217, 217)',
            ),
        title = title
    )
    
    return fig


In [101]:
fig = bubble_plot(unis_with_location, text="Name", size="Enrollment", title="Public Universities in California")
fig.show()

Can we add more information?

In [71]:
unis_with_color = unis_with_location.with_column('Color', ['crimson'] * unis_with_location.num_rows)
unis_with_color

Name,City,County,Enrollment,Founded,Latitude,Longitude,Color
California Polytechnic State University,San Luis Obispo,San Luis Obispo,21812,1901,35.305,-120.662,crimson
"California State Polytechnic University, Pomona",Pomona,Los Angeles,26443,1938,34.0589,-117.819,crimson
California State University Channel Islands,Camarillo,Ventura,7095,2002,34.1621,-119.043,crimson
California State University Maritime Academy,Vallejo,Solano,1017,1929,38.0689,-122.23,crimson
California State University San Marcos,San Marcos,San Diego,14511,1988,33.1295,-117.16,crimson
"California State University, Bakersfield",Bakersfield,Kern,10493,1965,35.3487,-119.103,crimson
"California State University, Chico",Chico,Butte,17488,1887,39.7298,-121.846,crimson
"California State University, Dominguez Hills",Carson,Los Angeles,15741,1960,33.8662,-118.257,crimson
"California State University, East Bay",Hayward,Alameda,14525,1959,37.6571,-122.057,crimson
"California State University, Fresno",Fresno,Fresno,24995,1911,36.8134,-119.746,crimson


In [102]:
fig = bubble_plot(unis_with_color, text="Name", size="Enrollment", color="Color", title="Public Universities in California")
fig.show()

It would be nice if this were color-coded based on UC vs. CSU. We can do that!

In [82]:
'University of California' in 'University of California, Davis'

True

In [90]:
def code_uc(name):
    if 'University of California' in name:
        return 'royalblue'
    else:
        return 'crimson'

In [92]:
uni_locations_separate = unis_with_color.with_column('Color', unis_with_color.apply(code_uc, 'Name'))
uni_locations_separate

Name,City,County,Enrollment,Founded,Latitude,Longitude,Color
California Polytechnic State University,San Luis Obispo,San Luis Obispo,21812,1901,35.305,-120.662,crimson
"California State Polytechnic University, Pomona",Pomona,Los Angeles,26443,1938,34.0589,-117.819,crimson
California State University Channel Islands,Camarillo,Ventura,7095,2002,34.1621,-119.043,crimson
California State University Maritime Academy,Vallejo,Solano,1017,1929,38.0689,-122.23,crimson
California State University San Marcos,San Marcos,San Diego,14511,1988,33.1295,-117.16,crimson
"California State University, Bakersfield",Bakersfield,Kern,10493,1965,35.3487,-119.103,crimson
"California State University, Chico",Chico,Butte,17488,1887,39.7298,-121.846,crimson
"California State University, Dominguez Hills",Carson,Los Angeles,15741,1960,33.8662,-118.257,crimson
"California State University, East Bay",Hayward,Alameda,14525,1959,37.6571,-122.057,crimson
"California State University, Fresno",Fresno,Fresno,24995,1911,36.8134,-119.746,crimson


In [103]:
fig = bubble_plot(uni_locations_separate, text="Name", size="Enrollment", color="Color", title="UCs and CSUs")
fig.show()

Violà!

### Where are you all from?