# Lab 1 - SQL

*Objective:* to practice writing SQL queries.

To run this lab as a `jupyter` notebook, you can download it
[here](lab1.zip) (the zip-file contains the notebook and the
database).

## Background

We have a database to handles the academic achievements of
students at LTH -- in it we have three tables:

* `students` -- contains student data:
   - `ssn` -- social security number ('personnummer')
   - `first_name`
   - `last_name`


* `courses` -- describes the courses:
   - `course_code`
   - `course_name`
   - `level` ("G1", "G2", or "A")
   - `credits`

  
* `taken_courses` -- keeps track of which courses the
   students have taken, once a student has passed a course,
   we add a row in this table:
   - `ssn` -- the social security number of the student
   - `course_code` -- what course has been taken
   - `grade`

![There should be an image here](lab1.png)

Some sample data:

~~~ {.text}
ssn           first_name   last_name
---           ----------   ---------
861103–2438   Bo           Ek
911212–1746   Eva          Alm
950829–1848   Anna         Nyström
...           ...          ...

course_code   course_name                   level    credits
-----------   -----------                   -----    -------
EDA016        Programmeringsteknik          G1       7.5
EDAA01        Programmeringsteknik - FK     G1       7.5
EDA230        Optimerande kompilatorer      A        7.5
...           ...                           ...      ...

ssn           course_code   grade
---           -----------   -----
861103–2438   EDA016        4
861103–2438   EDAA01        3
911212–1746   EDA016        3
...           ...           ...
~~~


The tables have been created with the following SQL
statements:

~~~ {.sql}
CREATE TABLE students (
  ssn          CHAR(11),
  first_name   TEXT NOT NULL,
  last_name    TEXT NOT NULL,
  PRIMARY KEY  (ssn)
);

CREATE TABLE courses (
  course_code   CHAR(6),
  course_name   TEXT NOT NULL,
  level         CHAR(2),
  credits       DOUBLE NOT NULL CHECK (credits > 0),
  PRIMARY KEY   (course_code)
);

CREATE TABLE taken_courses (
  ssn           CHAR(11),
  course_code   CHAR(6),
  grade         INTEGER NOT NULL CHECK (grade >= 3 AND grade <= 5),
  PRIMARY KEY   (ssn, course_code),
  FOREIGN KEY   (ssn) REFERENCES students(ssn),
  FOREIGN KEY   (course_code) REFERENCES courses(course_code)
);
~~~


All courses offered at the "Computer Science and
Engineering" program at LTH during the academic year 2013/14
are in the table 'courses`. Also, the database has been
filled with made up data. SQL statements like the following
have been used to insert the data:

~~~ {.sql}
INTO   students (ssn, first_name, last_name)
VALUES ('950705-2308', 'Anna', 'Johansson'),
       ('930702-3582', 'Anna', 'Johansson'),
       ('911212-1746', 'Eva', 'Alm'),
       ('910707-3787', 'Eva', 'Nilsson'),
       ...
~~~


## Assignments

In [1]:
%load_ext sql

In [2]:
%sql sqlite:///lab1.sqlite

'Connected: None@lab1.sqlite'

The tables `students`, `courses` and `taken_courses` already
exist in your database. If you change the contents of the
tables, you can always recreate the tables with the
following command (at the mysql prompt):

~~~ {.sh}
sqlite3 lab1.db < setup-lab1-db.sql
~~~


After some of the questions there is a number in brackets.
This is the number of rows generated by the question. For
instance, [72] after question a) means that there are 72
students in the database.

a) What are the names (first name, last name) of all the
   students? [72]

In [3]:
%%sql
SELECT first_name, last_name
FROM students

Done.


first_name,last_name
Anna,Johansson
Anna,Johansson
Eva,Alm
Eva,Nilsson
Elaine,Robertson
Maria,Nordman
Helena,Troberg
Lotta,Emanuelsson
Anna,Nyström
Maria,Andersson


b) Same as question a) but produce a sorted listing. Sort
   first by last name and then by first name.

In [60]:
%%sql
SELECT last_name, first_name
FROM   students
ORDER BY last_name, first_name

Done.


last_name,first_name
Ahlman,Daniel
Alm,Eva
Alm,Martin
Andersson,Erik
Andersson,Erik
Andersson,Maria
Andersson,Niklas
Aspegren,Märit
Axelsson,Daniel
Berg,Henrik


c) What are the names of the students who were born in 1985?
   [4]

In [15]:
%%sql
SELECT first_name, ssn
FROM students
WHERE ssn LIKE '85%'

Done.


first_name,ssn
Ulrika,850706-2762
Bo,850819-2139
Filip,850517-2597
Henrik,850208-1213


d) The next-to-last digit in the social security number is
   even for females, and odd for males. List the names of
   all female students in our database. Hint: the `SUBSTR`
   function can be useful. [26]

In [66]:
%%sql
SELECT first_name, ssn
FROM students
WHERE SUBSTR(ssn, 10, 1) % 2 == 0

Done.


first_name,ssn
Anna,950705-2308
Anna,930702-3582
Eva,911212-1746
Eva,910707-3787
Elaine,931213-2824
Maria,951122-1048
Helena,910308-1826
Lotta,941003-1225
Anna,950829-1848
Maria,860819-2864


e) How many students are registered in the database?

In [20]:
%%sql
SELECT COUNT()
FROM students

Done.


COUNT()
72


f) Which courses are offered by the department of
   Mathematics (their course codes have the form `FMAxxx`)?
   [22]

In [34]:
%%sql
SELECT course_name, course_code
FROM   courses
WHERE  course_code LIKE 'FMA%'

Done.


course_name,course_code
Kontinuerliga system,FMA021
Optimering,FMA051
Diskret matematik,FMA091
Matematiska strukturer,FMA111
Matristeori,FMA120
"Matristeori, projektdel",FMA125
Geometri,FMA135
Olinjära dynamiska system,FMA140
"Olinjära dynamiska system, projektdel",FMA145
Bildanalys,FMA170


g) Which courses give more than 7.5 credits? [16]

In [65]:
%%sql
SELECT course_name, course_code, credits
FROM   courses
WHERE  credits > 7.5

Done.


course_name,course_code,credits
Coachning av programvaruteam,EDA270,9.0
Datorer i system,EDAA05,8.0
Tillämpad mekatronik,EIEF01,10.0
"Mekatronik, industriell produktframtagning",EIEN01,10.0
Digitalteknik,EIT020,9.0
Digitala bilder – kompression,EITF01,9.0
Elektromagnetisk fältteori,ESS050,9.0
Elektronik,ETIA01,8.0
Introduktionskurs i kinesiska för civilingenjörer,EXTA35,15.0
"Introduktionskurs i kinesiska för civilingenjörer, del 2",EXTF60,15.0


h) How may courses are there for each level (`G1`, `G2`, and
   `A`)?

In [44]:
%%sql
SELECT level, COUNT()
FROM   courses
GROUP BY level

Done.


level,COUNT()
A,87
G1,31
G2,60


i) Which courses (course codes only) have been taken by the
   student with social security number 910101–1234? [35]

In [47]:
%%sql
SELECT course_code
FROM taken_courses
WHERE ssn = '910101-1234'

Done.


course_code
EDA070
EDA385
EDAA25
EDAF05
EEMN10
EIT020
EIT060
EITF40
EITN40
EITN50


j) What are the names of these courses, and how many credits
   do they give?

In [68]:
%%sql
SELECT course_name, credits
FROM courses as c
JOIN (taken_courses)
USING (course_code)
WHERE ssn = '910101-1234'

Done.


course_name,credits
Datorer och datoranvändning,3.0
"Konstruktion av inbyggda system, fördjupningskurs",7.5
C-programmering,3.0
"Algoritmer, datastrukturer och komplexitet",5.0
Datorbaserade mätsystem,7.5
Digitalteknik,9.0
Datasäkerhet,7.5
Digitala och analoga projekt,7.5
Avancerad webbsäkerhet,4.0
Avancerad datasäkerhet,7.5


k) How many credits has the student taken?

In [59]:
%%sql
SELECT sum(credits)
FROM courses as c
JOIN (SELECT course_code
      FROM taken_courses
      WHERE ssn = '910101-1234')
USING (course_code)

Done.


sum(credits)
249.5


l) Which is the student’s grade average?

In [75]:
%%sql
select avg(grade)
FROM courses as c
JOIN (SELECT *
      FROM taken_courses
      WHERE ssn = '910101-1234')
USING (course_code)

Done.


avg(grade)
4.0285714285714285


m) Which students have taken 0 credits? [11]

In [125]:
%%sql
SELECT first_name
FROM students
WHERE ssn NOT IN (
    SELECT ssn
    FROM taken_courses)

Done.


first_name
Anna
Caroline
Bo
Erik
Erik
Johan
Filip
Jonathan
Magnus
Joakim


n) List the names and average grades of the 10 students with
   the highest grade average?

In [70]:
%%sql
SELECT avg(grade), first_name
FROM taken_courses
JOIN students
USING (ssn)
GROUP BY ssn
ORDER BY avg(grade) DESC
LIMIT 10

Done.


avg(grade),first_name
4.35,Bo
4.307692307692308,Helena
4.235294117647059,Elaine
4.230769230769231,Anna
4.21875,Ylva
4.2,Anna
4.173913043478261,Mikael
4.166666666666667,Jakob
4.157894736842105,Maria
4.153846153846154,Per-Erik


o) List the social security number and total number of
   credits for all students. Students with no credits should
   be included with 0 credits, not null. If you do this with
   an outer join you might want to use the function
   `COALESCE(v1, v2, ...)`; it returns the first value which
   is not `NULL`. (It is a little bit tricky to get this
   query right, if you're missing the students with 0
   credits, don't worry, your TA will help you get it
   right). [72]

In [74]:
%%sql
SELECT    first_name, last_name, COALESCE(SUM(credits), 0) AS credits
FROM      students
LEFT JOIN taken_courses
USING     (ssn)
LEFT JOIN courses
USING     (course_code)
GROUP BY  ssn

Done.


first_name,last_name,credits
Henrik,Berg,166.5
Filip,Persson,0.0
Ulrika,Jonsson,30.0
Bo,Ek,76.5
Eva,Hjort,151.0
Niklas,Andersson,70.5
Maria,Andersson,140.5
Bo,Ek,153.0
Caroline,Olsson,0.0
Marie,Persson,254.0


p) Is there more than one student with the same name? If so,
   who are these students and what are their social security
   numbers? [7]

In [46]:
%%sql
SELECT DISTINCT s1.first_name, s2.last_name, s1.ssn
FROM students AS s1
JOIN students AS s2
ON s1.first_name = s2.first_name AND s1.last_name = s2.last_name AND s1.ssn != s2.ssn

Done.


first_name,last_name,ssn
Anna,Johansson,950705-2308
Anna,Johansson,930702-3582
Bo,Ek,861103-2438
Bo,Ek,931225-3158
Bo,Ek,850819-2139
Erik,Andersson,891220-1393
Erik,Andersson,900313-2257


q) What 5 courses have the highest grade average?

In [4]:
%%sql
SELECT avg(grade), course_name
FROM courses
JOIN taken_courses
USING (course_code)
GROUP BY course_name
ORDER BY avg(grade) DESC
LIMIT 5

Done.


avg(grade),course_name
4.75,Digitala och analoga projekt
4.6,Signalbehandling i multimedia
4.571428571428571,Avancerad interaktionsdesign
4.571428571428571,Medicinsk signalbehandling
4.5,"Franska för tekniker: språk, kultur och samhällsliv, grundkurs"


r) (Not required) What are the 'best' three first initial
   letters of the last names, i.e., if you take the average
   grades for each first letter of the last name, which
   three initials have the highest averages?

In [78]:
%%sql
SELECT avg(grade), SUBSTR(last_name, 1, 1) as Firstletter
FROM taken_courses
JOIN students
USING (ssn)
GROUP BY (SUBSTR(last_name, 1, 1))
ORDER BY avg(grade) DESC
LIMIT 3

Done.


avg(grade),Firstletter
4.307692307692308,T
4.2,J
4.146341463414634,C
