# Using Python: A Brief Introduction</center>

![](https://blogs.gartner.com/doug-laney/files/2015/01/big-data-word-cloud.jpg)


## <center><br/>Cara Marta Messina, Digital Teaching Integration Assistant Director<br/>Northeastern University</center>

### <center>Prepared for <br/>March 10th 2020</center>

What do we mean when we say “Big Data?” How can “big data” be used in CSS? What can “big data” tell us about ourselves? 

* In recent years, a variety of novel digital data sources, colloquially referred to as “big data,” have taken the popular imagination by storm. 
* These data sources include, but are not limited to: digitized administrative records; activity on and contents of social media and internet platforms; and readings from sensors that track physical and environmental conditions. 
* Some have argued that such data sets have the potential to transform our understanding of human behavior and society, constituting a meta-field known as computational social science
* Lying at the intersection of computer science, statistics and the social sciences, the emerging field of computational social science uses large-scale demographic, behavioral and network data to investigate human activity and relationships.

## Outline

1. Small group activity
2. Introduction to computational social science
3. “Big Data” and Data Collection
    - What can Facebook tell you about yourself?
    - What can Google tell you about yourself?  
    - Why is this useful? 
4. How will data analytics impact our future?
    - Example: Activity tracking products for health insurance
5. Live Example: Using computational methods to examine school shooting statistics and narratives
6. Introduction to Python & Jupyter Notebook



### Small Group Discussion:
In groups of 2-3 people, have a quick **5 minute discussion** to think about some (or all) of these questions: 
- What data do you think is being collected about you? 
- Where and how do you think this data is being collected? 
- Where is it stored? Who sees it? 
- How do you image your data is being used? 
- What is your reaction to this information being collected, stored, and potentially used?

![](https://thevpn.guru/wp-content/uploads/2017/08/The-Ugly-Truth-About-How-Facebook-Uses-Your-Private-Data.jpg)

## Social Media Ads

Social media apps collect and store all the information about you; it not only collects what you write or enter in your bio, but also which posts you click, what you search, who you follow, who follows you, and data aggregated from other sites, as well. This data is collected to create a profile that describes and categorizes who "you" are according to your activities on social media platforms and beyond. Part of the reason this data is collected is to create targeted, personal ads. 

### Activity: What do your social media ads tell you about yourself?
Open up your social media platforms (Facebook, Twitter, YouTube, Instagram, Snapchat, etc). Search for ads that pop up on the platform. Choose a partner and discuss:
- What sorts of advertisements are you receiving?
- Which advertisements appeal to you? Which do not?
- Are these advertisements interesting, surprising, or not surprising? Why?
- Compare your advertisements across different social media platforms. How are they similar? How are they different?
- Why do you think this adverisement what targeted towards you?

**If you don't have any social media accounts, feel free to listen in on another conversation. We will show you our results shortly.**

Here are some examples of advertising:


![](./visuals/ad-IG1.jpg) 

![](./visuals/ad-IG2.jpg) 

![](./visuals/ad-TW1.jpg) 

![](./visuals/ad-TW2.jpg) 

## Google Timeline

Just as Facebook keeps track of the clicks you make, Google keeps track of the steps you make (as well as many other points of data). By using Google Timeline, you can trace the routes you took. The example we are using is a trip Alexis took.


![](./visuals/google_alexisscreenshot1.png)
![](./visuals/google_alexisscreenshot2.png)


## How can collecting data on us impact us in the future?

![](https://i0.wp.com/9to5mac.com/wp-content/uploads/sites/6/2018/09/run.jpg?resize=1500%2C0&quality=82&strip=all&ssl=1)

### Activity tracker like iPhone or Apple Watch now mandatory for John Hancock life insurance 

* Policyholders score premium discounts for hitting exercise targets tracked on wearable devices such as a Fitbit or Apple Watch
* Raised questions about whether insurers may eventually use data to select the most profitable customers
* Will policyholders be penalised for walking through a sketchy area, logged by the GPS in their device? What about an activity tracker logging a strenuous hike as a risk factor? Or deciding that someone is cycling or skiing dangerously fast?


### China is building a digital dictatorship to exert control over its 1.4 billion citizens. For some, “social credit” will bring privileges — for others, punishment.

### #WhyIStayed: Survivors of Domestic Violence Relationships

![](./visuals/ethics_whyleft.png)


### Big data in Healthcare: From Opiod Crsis to Tracking the Flu

![](https://i1.wp.com/ems-solutionsinc.com/wp-content/uploads/2015/11/Screen-Shot-2015-11-23-at-5.23.57-AM.png?ssl=1) 


## Using Computational Methods to Examine School Shootings and the Media 

Grounded in theory and empirical background, the research examines several research questions: 
* Has the media coverage of school shootings increased over time?
* Has the media coverage of school shootings increased over time relative to the amount of school shootings over time? 
* Has the framing of school shootings changed over time? 
    * Specifically, has the mention of mass shootings as a gun control issue versus a mental health issue increased, stayed the same, or decreased over time? 

![](./visuals/NYT-code1.png) 

![](./visuals/NYT-code2.png) 

![](./visuals/NYT-visual1.png) 

![](./visuals/NYT-code3.png)

![](./visuals/NYT-visual2.png) 

![](./visuals/NYT-code4.png)

![](./visuals/NYT-visual3.png)

### Summary: 
* Schools are safer now than they were in the 1990s.
    * Four times the number of children were killed in schools in the early 1990s than today (Fox & Fridel 2018)
* There is a negative correlation showing that fatal school shootings have decreased over time
* Media has shown a different trend with an increase in articles covering school shootings over time
    * Spikes for high victim, high profile shootings
* Mentions of mental health and gun control only happen in high profile school shootings: Sandy Hook and Columbine 

# Introduction to Python 

## Code and Text written by: <br/>
## Laura Nelson ~~ *Assistant Professor of Sociology*<br/><br/>Northeastern University

<br/>

# Introduction

It is increasingly important to learn a scripting language, such as Python or R, in order to access, collect, and structure data from diverse sources, and analyze data using new and developing methods, such as machine learning.

There is no way you will remember all of this and that is completely normal. Focus on learning the syntax, but also getting a higher-level understanding of the way Python works. This is just a basic introduction to understand how python works and what you can do with it, but there are a lot of resources out there if this is something that interests you. If you've never written in Python, all of this may feel very strange to you. It gets easier as you work with it more.

# Learning Goals

- Understand what Python is, why it is useful, and how to use Python for data analysis.
- Understand how Python interacts with, and represents, data. 

# Learning Outcomes

- Learn and be able to explain Python basics - introduction, arithmatic, dataframes, visualizations
    
# Workshop Outline
1. Python basics
2. Dataframes
3. Visualizations


# 1. Python basics

In [None]:
# python is all about functions, variables, and doing things to variables using functions
# the print function

print("Hello, world!")

In [None]:
##try printing something yourself here!


<a id='arithmetic'></a>
### Arithmetic

In [None]:
# Computers are really good at arithmetic
# Addition

2+5

In [None]:
# Let's have Python report the results from three operations at the same time

print(2-5)
print(2*5)
print(2/5)

In [None]:
## Take 5 minutes and run some algebra here!


# The Pandas Dataframe

******************************
The data we'll analyze today comes from:

National Center for Education Statistics, United States Department of Education. (2009). Early Childhood Longitudinal Study, Kindergarten Class of 1998-99 (ECLS-K) [Data file]. Available from http://nces.ed.gov/ecls/kindergarten.asp

I selected five variables (columns) to analyze:

* reading_score = READING IRT SCALE SCORE
* math_score = MATH IRT SCALE SCORE
* knowledge_score = GENERAL KNOWLEDGE IRT SCALE SCORE
* p2income = TOTAL HOUSEHOLD INCOME
* incomecat = INCOME CATEGORES
    * 1 = low income: < \$40,000
    * 2 = mid income
    * 3 = high income: >= \$70,000
    
The unit of observation (row) is the individual kindergartner.  
   
## Motivating Questions

1. Are math, reading, and general knowledge scores related to household income in any predictable way?
2. Can you predict general knowledge scores from reading or math scores? That is, are reading and math skills related to general knowledge?

In [None]:
#import our library 
import pandas

In [None]:
df = pandas.read_csv("education_dataset.csv", sep=',')
df

In [None]:
#It's easier to view the information when you have a piece of it instead of the entire thing, 
#especially when you have a LOT of data
df.head()

In [None]:
#syntax to extract columns
df['reading_score'].head()

In [None]:
## Try it yourself: Extract the knowledge score column

df['knowledge_score'].head()

In [None]:
#extract one row: notice the syntax and the ZERO
df.loc[0]

In [None]:
## Try it yourself - extract the 20th row of data 

df.loc[19]

## Summary Statistics

In [None]:
## Summary statistics: Mean
df['reading_score'].mean()

In [None]:
#Sum
df['reading_score'].sum()

In [None]:
#Standard deviation
df['reading_score'].std()

In [None]:
##Ex: explore one of the other columns - 
#find the MEAN, SUM, and STANDARD DEVIATION of another column that interests you!




In [None]:
#We can find it all at the same time

df.describe()

# Visualization

In [None]:
import matplotlib.pyplot as plt

In [None]:
df.hist()
plt.show()

In [None]:
#That's not pretty. Let's show just one

df['knowledge_score'].hist()
plt.show()

In [None]:
## Try doing a histogram with another column that interests you

    

In [None]:
#Other options:
#Scatter plot: is math and reading scores correlated?

df.plot(kind='scatter', x = 'reading_score', y = 'math_score')
plt.show()

In [None]:
## Finish up by looking at scatterplots of different relationships on your own
## What are some patterns you find? Ex: Is there a relationship between scores and household income?



In [None]:
#advanced: Pandas groupby function
#create a new dataframe that is grouped by income category

df_grouped = df.groupby('incomecat')
df_grouped 
df_grouped_mean = df_grouped.mean()
df_grouped_mean

In [None]:
df_grouped_mean[['reading_score', 'math_score', 'knowledge_score']].plot(kind='bar')
plt.legend(loc=9, bbox_to_anchor=(0.5, -0.2), ncol = 3)
plt.show()

## Want to learn more?

If you are interested in Computational Social Science, data analytics, ethical implications, and any of the topics we covered today, we encourage you to begin looking at potential courses or minors you might pursue!

- Computational Social Science minor
- Digital Minor
- Combined major in Computer Science and CSSH
- Other courses you might take: DS 2000/DS 20001 (Data Science) 


# Thank You!

If you have questions, contact us at:

### Cara Marta Messina
Digital Integration Teaching Initiative
Assistant Director
messina.c@husky.neu.edu

### Slides, handouts, and data available at [LINK TO GITHUB](INSERT LINK HERE)

### Schedule an appointment with us! [https://bit.ly/diti-office-hours](https://bit.ly/diti-office-hours)