Skip to content

Latest commit

 

History

History
64 lines (42 loc) · 3.98 KB

todo.md

File metadata and controls

64 lines (42 loc) · 3.98 KB

Daily To-do Assignments

To-do #1

Due 1/9 (Th), 12:30pm

The Internet is full of published linguistic data sets. Let's data-surf! Instructions:

  1. Go out and find two linguistic data sets you like. One should be a corpus, the other should be some other format. They must be free and downloadable in full. Make sure they are linguistic data sets, meaning designed for linguistic inquiries.
  2. You might want to start with various bookmark sites listed in the following Learning Resources sections: Linguistic Data, Open Access, Data Publishing, and Corpus Linguistics. But don't be constrained by them.
  3. Download the data sets and poke around. Open up a file or two to take a peek. (No need to do this in Python.)
  4. In a text file (should have the .txt extension), make note of:
  • The name of the data resource
  • The author(s)
  • The URL of the download page
  • Its makeup: size, type of language, format, etc.
  • License: whether it comes with one, and if so what kind?
  • Anything else noteworthy about the data. A sentence or two will do.
  1. If you are comfortable with markdown, make an .md file instead of a text file.

SUBMISSION: Upload your text file to To-do1 submission link, on CourseWeb. If you do not have CourseWeb access, email your submission to Jevon cc Cassie and John.

To-do #2

Due 1/16 (Th), 12:30pm

Learn about the numpy library: study the Python Data Science Handbook and/or the NumPy documentation here. While doing so, create your own study notes, as a Jupyter Notebook file entitled numpy_notes_yourname.ipynb. Include examples, explanations, etc. Replicating DataCamp's examples is also something you could do. You are essentially creating your own reference material.

SUBMISSION: Your file should be in the todo2/ directory of the Class-Exercise-Repo. Make sure it's configured for the "upstream" remote and your fork is up-to-date. Push to your GitHub fork, and create a pull request for me.

To-do #3

Due 1/21 (Tue)

Study the pandas library (through the Python Data Science Handbook and/or the documentation. pandas is a big topic with lots to learn: aim for about 1/2. While doing so, try it out on TWO spreadsheet (.csv, .tsv, etc.) files:

  1. The first file should be your choice. You can get one from this CSV Files archive, or make up your own. Keep it super simple! It's supposed to be a toy dataset.
  2. The second one should be billboard_lyrics_1964-2015.csv by Kaylin Pavlik, from her project '50 Years of Pop Music'. (Note: you might need to specify ISO8859 encoding.)

Name your Jupyter Notebook file pandas_notes_yourname.ipynb. Don't change the filename of any downloaded CSV files or edit them in any way.

SUBMISSION: Your files should be in the todo3/ directory of Class-Exercise-Repo. Commit and push all three files to your GitHub fork, and create a pull request for me.

To-do #4

Due 1/23 (Thu)

This one is a continuation of To-do #3: work further on your pandas study notes. You may create a new JNB file, or you can expand the existing one. Also: try out a spreadsheet submitted by a classmate. You are welcome to view the classmate's notebook to see what they did with it. (How to find out who submitted what? Git/GitHub history of course.) Give them a shout-out.

SUBMISSION: We'll stick to the todo3/ directory in Class-Exercise-Repo. Push to your GitHub fork, and create a pull request for me.