Course Materials for DS-120 at Luther College
- Correlation
- Collaborative Filtering
- More Regression
For Friday
- Introducing Regression and basic stats
- for Wednesday -- Crickets data
- Finish our our sentiment analysis model
- Test some examples
For Friday
- Buzzfeed/BBC Tennis Investigation
- Finding the Tennis Suspects
- Reading response on Katie
For Monday
- Take 1000 tweets from the test set and classify them.
- How many are classified correctly?
- Review Quiz 2
- Putting together the positive and negative summary tabs
- Making our first prediction
- Introduction to Bayesian Classification
For Friday Discussion
- Listen to one of the following podcasts
- Data Skeptic -- Urban Congestion https://overcast.fm/+BzxPcuFIo
- FiveThirtyEight Whats the point -- Bear Tracks 9/16/16 https://overcast.fm/+Ez2iSeJdo
- Write 500 word response on Katie, and write two questions for class discussion.
For Monday
- Find a dataset that will be appropriate for you try your hand at clustering. You will turn in a summary of the dataset, the variables it contains, and why you think it is a good dataset for the K-means clustering algorithm. Remember the limitations of Excel! Solver can only manage 200 variables, so your number of variables * the number of clusters must be less than 200. By Monday March 13, you will need to turn in An Excel spreadsheet with your completed clustering, showing your centroids, and an analysis of how accurately you were able to classify your training and test sets. If your data set does not contain predetermined labels then you should provide an analysis of what you think your clusters represent.
Here are two good sources for data sets to get you started:
- Work on Iris Data Clustering Assignment
-
Review the Quiz
-
Update the soccer team
-
Introducing Machine Learning
- Supervised Learning
- Unsupervised Learning
- The machine learning pipeline
- K-means Clustering
-
Assignment for Wednesday
- Optimize the Concession stand profits using the following constraints:
- The total number of items sold must be 200 or less
- You cannot sell more than 25 hamburgers
- You only have 30 Popsicles and 30 ice cream sandwiches
- You can sell a maximum of 20 pizza
- The combination of water, beer and soda must be more than the total of hamburgers pizza and hot dogs
- Optimize the Concession stand profits using the following constraints:
- Finish up the Fantasy Football Optimization
- Quiz on excel functions
For Friday
- Readings:
- Using The Solver in Excel
- USDA Nutrition Data
For Wednesday
- Download the Concessions workbook
- Find the Calories-Solver tab. You will also need to enable the solver extension as described in the book. Modify the sheet so that it has the following constraints and goals: We want to find the minimum priced meal with 2400 calories that includes at least one beverage, at least one main meal item, and at least one desert item.
- You can upload your finished workbook to Katie.
Reminder: Excel Quiz on Wednesday
-
Topics
- Review match, index,
- vLookup -- Making a nutrition database
-
Readings For Friday
- The following readings bring up some interesting issues regarding the use of Big Data in the courtroom. Sometimes there are unintended consequences when we willingly giving up information that seems harmless.
- When Fitbit is the Expert witness
- Divorced by Data
-
Topics
- We will work on developing some summary information on the Nutrition Spreadsheet
- Key Excel Functions from today:
- sum
- average
- max
- match
- index
- sumif
- countif
-
Assignment
-
For Wednesday develop a simple expense tracking spreadsheet with the following columns
-
date
-
purchaser (name of the person that bought the item)
-
category (grocery, toiletry, lawn&garden, etc)
-
item
-
cost
-
store (where you purchased it)
-
Add 15 - 20 rows of data and then create a summary of the total cost and average cost of all items. Use the SUMIF function to calculate The total for each category, and the SUMIF and COUNTIF functions to calculate the average for items purchased in a particular store.
-
-
For Wednesday, read chapter 1 in Data Smart
-
-
Topics
-
Discussion of the readings
-
Find an interesting "data story" -- What kind of story is it (see the 10 kinds of stories article)
-
Analyze the "data story" carefully - does it meet the criteria for objectivity? honesty?
-
Starting in on Excel
-
-
Assignment
- Create a spreadsheet for the weekend of nutritional data from what you eat. (or a made-up diet, I don't need to know what you ate) The columns in the spreadsheet should be:
- Meal
- Food
- calories
- protein
- carbs
- fat
I understand the Caf has nutritional data posted for the meals or you can use the USDA website or another app that has nutrition data about common foods. the point is to make a spreadsheet with "a bunch" of rows in it.
- Create a spreadsheet for the weekend of nutritional data from what you eat. (or a made-up diet, I don't need to know what you ate) The columns in the spreadsheet should be:
- Topics
- What is Data Science?
- Course Administrivia
- What do you know about Excel?
- Assignment
- Read the Week 1 readings and write a 500 word response: Why are you interested in data science? what if any experiences do you have with data science related things? What did you take away from the readings.
For Friday Read the following articles then write a 500 word response to the prompts on Katie.