# YOUR PROJECT TITLE

> **Note the following:** 
> 1. This is *not* meant to be an example of an actual **data analysis project**, just an example of how to structure such a project.
> 1. Remember the general advice on structuring and commenting your code from [lecture 5](https://numeconcopenhagen.netlify.com/lectures/Workflow_and_debugging).
> 1. Remember this [guide](https://www.markdownguide.org/basic-syntax/) on markdown and (a bit of) latex.
> 1. Turn on automatic numbering by clicking on the small icon on top of the table of contents in the left sidebar.
> 1. The `dataproject.py` file includes a function which will be used multiple times in this notebook.

Imports and set magics:

In [30]:
import pandas as pd
import matplotlib.pyplot as plt
import ipywidgets as widgets
import numpy as np
import datetime as dt

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# local modules
import dataproject

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Read and clean data

Import the excelfile of the deescription and the data

In [16]:
meddef = pd.read_excel('medpriser.xlsx')
meddef

Unnamed: 0.1,Unnamed: 0,Unnamed: 1
0,På baggrund af data fra Lægemiddelstyrelsen op...,
1,,
2,Ord og begreber i opgørelsen:,
3,ATC,Lægemiddelklassifikation som fastsættes af WHO...
4,Lægemiddel,Lægemidlets handelsnavn. Navnet fra den senest...
5,Varenummer,Lægemidlets varenummer som også er trykt på pa...
6,Pakning,"Pakningsstørrelse, f.eks. 100 stk."
7,Styrke,"Lægemidlets styrke, f.eks. 5 mg eller 0,01%. F..."
8,Form,"Lægemiddelform, f.eks. tabletter, kapsler, mik..."
9,Firma,Distributør af lægemidlet.


In [70]:
medprice = pd.read_excel('meddata.xlsx')

Clean data: Are there any missing variables

In [73]:
medprice.head()

Unnamed: 0,ATC,Lægemiddel,Varenummer,Pakning,Styrke,Form,Firma,Indikator,20150202,20150216,...,20191118,20191202,20191216,20191230,20200113,20200127,20200210,20200224,20200309,20200323
0,A01AA01,Bifluorid,42846,4 g + solvens,,dentalsuspension,Voco,AIP,407.36,407.36,...,,,,,,,,,,
1,A01AA01,Bifluorid,42846,4 g + solvens,,dentalsuspension,Voco,AUP,570.25,570.25,...,,,,,,,,,,
2,A01AA01,Bifluorid,42846,4 g + solvens,,dentalsuspension,Voco,DDD,,,...,,,,,,,,,,
3,A01AA01,Bifluorid,42846,4 g + solvens,,dentalsuspension,Voco,AUP_pr_DDD,,,...,,,,,,,,,,
4,A01AA01,Bifluorid,43158,10 g,,dentalsuspension,Voco,AIP,602.07,602.07,...,,,,,,,,,,


In [23]:
#There is a lot of missing variables in the dataset. This can occour because the firms no longer bids on the auctions. We need to check, that python correctly sees the missing values, as missing.
medprice.isnull()

Unnamed: 0,ATC,Lægemiddel,Varenummer,Pakning,Styrke,Form,Firma,Indikator,20150202,20150216,...,20191118,20191202,20191216,20191230,20200113,20200127,20200210,20200224,20200309,20200323
0,False,False,False,False,True,False,False,False,False,False,...,True,True,True,True,True,True,True,True,True,True
1,False,False,False,False,True,False,False,False,False,False,...,True,True,True,True,True,True,True,True,True,True
2,False,False,False,False,True,False,False,False,True,True,...,True,True,True,True,True,True,True,True,True,True
3,False,False,False,False,True,False,False,False,True,True,...,True,True,True,True,True,True,True,True,True,True
4,False,False,False,False,True,False,False,False,False,False,...,True,True,True,True,True,True,True,True,True,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64863,False,False,False,False,False,False,False,False,True,True,...,True,True,True,True,True,True,True,True,True,True
64864,False,False,False,False,False,False,False,False,True,True,...,True,True,True,True,True,True,True,True,True,True
64865,False,False,False,False,False,False,False,False,True,True,...,True,True,True,True,True,True,True,True,True,True
64866,False,False,False,False,False,False,False,False,True,True,...,True,True,True,True,True,True,True,True,True,True


The isnull method correctly recognizes the missing values of the medicine prices.

Coverting the data to long format

In [45]:
drops = ['Varenummer'] 
medprice.drop(drops, axis=1, inplace=True)

In [1]:
df = medprice[medprice.ATC != 'N03AX09']
df1 = medprice[medprice.Pakning != '50 stk. (blister)'] 
df2 = medprice[medprice.Lægemiddel != 'Lamotrigin "1A Farma"'] 
df3= medprice[medprice.Lægemiddel != 'Lamotrigin "Stada"'] 
df4 = medprice[medprice.Lægemiddel != 'Lamotrigin "Teva"']
df.drop(df.index)
df1.drop(df1.index)
df2.drop(df2.index)
df3.drop(df3.index)
df4.drop(df4.index)
medprice.head()

NameError: name 'medprice' is not defined

In [38]:
#med_long = medprice.set_index('ATC')
#med_long.index = pd.to_datetime(med_long.index)
#del med_long.index.name
#med_long