There will be no class on Friday 8th of November. On Friday November 15th we will have supervision of groups and presentation for the group that has not yet presented. You may save a spot (15-20 minutes) for individual (group) supervision on November 15th. If you want to request supervision please send me an email and the requested time slot. You can request time slots between 12:15 and 15:00 on November 15th. Please come prepared with questions or discussion points to the supervision session.
Important note: If you want to postpone supervision send me an email. I will also be available for supervision on November 22.
The maximum score was 12 points. One point was given per subtask. A total score between 0 and 100% was calculated and grades set on based on the grading scale below:
- A = 92 - 100
- B = 77 - 91
- C = 58 - 76
- D = 46 - 57
- E = 40 - 45
- F = 0 - 39
Course material for GRA 4157 - (Big) Data Curation, Pipelines, and Management.
04-10-2024 - Mid-term exam (40%) 09:00 - 11:00. Room D3-141. Technical knowledge, concepts from programming with data.
07-11-2024 - The final exam (60%) is a written repor based on two group presentations (1 - 3 per group) during the semester.
Lectures will be held each Friday 12-13:45 between August 23th and November 8th. You may contact me at vegard@xal.no.
https://rl.talis.com/3/binorway/lists/4D39CD33-F47E-E95D-1F5B-0511BBC9B6BF.html
Part 1
- Basic Python lists, dictionaries and operations.
- Reading from and writing to files, flexible solutions.
- Numerical python with numpy, arrays, array slicing for vectorized computations.
- Code standards, version control and code-collaboration.
Part 2
- Working with the pandas library
- Reading data from websites
- Data visualisation
Part 3
- Cleaning data, combining data sets
- Machine learning workflows with scikit learn
- Assess machine learning models based on various assumptions on data (outliers etc)
For a given lecture, the reading gives an approximate overview of what is expected to be known after the lecture. I expect you to solve the exercises after the lecture. Each week, we start the lecture with a student presentation of a exercise of choice. Send an email to vegard@xal.no to volunteer for an exercise. For exercises regarding pandas we refer to the w3resource (W3) https://www.w3resource.com/python-exercises/pandas/index-dataframe.php
Date | Topic | Reading | Exercises | Student presentation |
---|---|---|---|---|
Aug. 23 | Course Introduction. Python recap, lists and dictionaries. Testing. Decorators. | Sundnes: Chap 1,2,3 (and 7) | Sundnes: 2.7, 2.8, 2.9, 2.15, 2.18, 3.3, 3.6, 3.17 | |
Aug. 30 | Reading and writing to file. User input. Exceptions. More on command line arguments | Sundnes: Chap 5 | Sundnes: 4.4, 4.9, 4.10, 4.12, 4.13, 4.17, 4.23 | Yulin Vera: 2.15 |
Sep. 06 | Numerical Python and plotting | Sundnes: Chap 6 | Sundnes: 5.1, 5.2, 5.3, 5.4, 5.10, 5.12, 5.14, 5.28, 5.46, 5.54 | Shan Xu: 4.4 Bohdan: 4.23 |
Sep. 13 | Pandas | McKinney: Chap 5 | W3: DataFrames: 2.-22., 73 | Yurou 5.2 Nhung: 5.46 |
Sep. 20 | Web scraping | KcKinney: Chap 6 | W3: Pandas Performance: 1.-20. (select 5-10 exercises) + GitHub Exercies Note: Some changes were made to the exercises on 24. sept | |
Sep. 27 | Github, Pipelines, Github actions | Selena: 1 Ái Linh, Eirik: 2 Ilia: 3 Narges: 4 Johannes: 4 |
||
Oct. 1 | Q & A Mid-term 08:00 - 09:45 | Previous lectures | Room C2-055 | |
Oct. 04 | Mid-term 09:00 - 11:00 | Room D3-141 | ||
Oct. 11 | Machine learning part 1 | Project 1 | ||
Oct. 18 | Group presentations | Project 2 | ||
Oct. 25 | Machine learning part 2. Scientific writing | |||
Nov. 01 | Group presentations | |||
Nov. 15 | Group presentations (only groups that did not present on Nov. 01) and supervision |