Skip to content
/ GRA4157 Public

Course material for GRA 4157 - (Big) Data Curation, Pipelines, and Management

Notifications You must be signed in to change notification settings

BI-DS/GRA4157

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GRA4157

Final weeks of class

There will be no class on Friday 8th of November. On Friday November 15th we will have supervision of groups and presentation for the group that has not yet presented. You may save a spot (15-20 minutes) for individual (group) supervision on November 15th. If you want to request supervision please send me an email and the requested time slot. You can request time slots between 12:15 and 15:00 on November 15th. Please come prepared with questions or discussion points to the supervision session.

Important note: If you want to postpone supervision send me an email. I will also be available for supervision on November 22.

Mid-term grading

The maximum score was 12 points. One point was given per subtask. A total score between 0 and 100% was calculated and grades set on based on the grading scale below:

  • A = 92 - 100
  • B = 77 - 91
  • C = 58 - 76
  • D = 46 - 57
  • E = 40 - 45
  • F = 0 - 39

Course material for GRA 4157 - (Big) Data Curation, Pipelines, and Management.

Exams

04-10-2024 - Mid-term exam (40%) 09:00 - 11:00. Room D3-141. Technical knowledge, concepts from programming with data.

07-11-2024 - The final exam (60%) is a written repor based on two group presentations (1 - 3 per group) during the semester.

Lectures

Lectures will be held each Friday 12-13:45 between August 23th and November 8th. You may contact me at vegard@xal.no.

Syllabus

https://rl.talis.com/3/binorway/lists/4D39CD33-F47E-E95D-1F5B-0511BBC9B6BF.html

Topics

Part 1

  • Basic Python lists, dictionaries and operations.
  • Reading from and writing to files, flexible solutions.
  • Numerical python with numpy, arrays, array slicing for vectorized computations.
  • Code standards, version control and code-collaboration.

Part 2

  • Working with the pandas library
  • Reading data from websites
  • Data visualisation

Part 3

  • Cleaning data, combining data sets
  • Machine learning workflows with scikit learn
  • Assess machine learning models based on various assumptions on data (outliers etc)

Preliminary lecture plan

For a given lecture, the reading gives an approximate overview of what is expected to be known after the lecture. I expect you to solve the exercises after the lecture. Each week, we start the lecture with a student presentation of a exercise of choice. Send an email to vegard@xal.no to volunteer for an exercise. For exercises regarding pandas we refer to the w3resource (W3) https://www.w3resource.com/python-exercises/pandas/index-dataframe.php

Date Topic Reading Exercises Student presentation
Aug. 23 Course Introduction. Python recap, lists and dictionaries. Testing. Decorators. Sundnes: Chap 1,2,3 (and 7) Sundnes: 2.7, 2.8, 2.9, 2.15, 2.18, 3.3, 3.6, 3.17
Aug. 30 Reading and writing to file. User input. Exceptions. More on command line arguments Sundnes: Chap 5 Sundnes: 4.4, 4.9, 4.10, 4.12, 4.13, 4.17, 4.23 Yulin
Vera: 2.15
Sep. 06 Numerical Python and plotting Sundnes: Chap 6 Sundnes: 5.1, 5.2, 5.3, 5.4, 5.10, 5.12, 5.14, 5.28, 5.46, 5.54 Shan Xu: 4.4
Bohdan: 4.23
Sep. 13 Pandas McKinney: Chap 5 W3: DataFrames: 2.-22., 73 Yurou 5.2
Nhung: 5.46
Sep. 20 Web scraping KcKinney: Chap 6 W3: Pandas Performance: 1.-20. (select 5-10 exercises) + GitHub Exercies Note: Some changes were made to the exercises on 24. sept
Sep. 27 Github, Pipelines, Github actions Selena: 1
Ái Linh, Eirik: 2
Ilia: 3
Narges: 4
Johannes: 4
Oct. 1 Q & A Mid-term 08:00 - 09:45 Previous lectures Room C2-055
Oct. 04 Mid-term 09:00 - 11:00 Room D3-141
Oct. 11 Machine learning part 1 Project 1
Oct. 18 Group presentations Project 2
Oct. 25 Machine learning part 2. Scientific writing
Nov. 01 Group presentations
Nov. 15 Group presentations (only groups that did not present on Nov. 01) and supervision

About

Course material for GRA 4157 - (Big) Data Curation, Pipelines, and Management

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published