# Data in and anonymisation

In [1]:
import numpy as np
import pandas as pd

<mark>Jump right into analyses from here:</mark>

[Counting values from individual registrations](#1)

# Anonymisation

Read indico csv-files from hard drive

In [11]:
individualDF_org = pd.read_csv("/Users/mjaaske/CSCPurkki/CodeRefinery_Purkki/Stats/CR_Individual_2022_10_13.csv", index_col='ID')
teamsDF_org = pd.read_csv("/Users/mjaaske/CSCPurkki/CodeRefinery_Purkki/Stats/CR_Team_2022_10_13.csv", index_col='ID')

In [4]:
# Drop some labels that have only NaN
indy_drolabels = ["Name","Email Address", "Title", "Position", "Tags", 'Other notes to organizers']
team_drolabels = ["Name","Email Address", "Title", "Tags", "Alternative way of submitting results (link)"]

individualDF_org.drop(labels=indy_drolabels, axis=1, inplace=True)
teamsDF_org.drop(labels=team_drolabels, axis=1, inplace=True)

Check the remaining columns

In [5]:
individualDF_org.columns

Index(['Country', 'Affiliation or university', 'Academic discipline',
       'Career stage / position', 'How did you find out about this workshop?',
       'Attendance/participation type',
       'Stream only or also exercise group? Video or in-person?',
       'Which days you plan to attend?', 'Registration date',
       'Registration state'],
      dtype='object')

In [6]:
teamsDF_org.columns

Index(['Affiliation', 'Actual number of teams', 'Actual number of learners',
       'Actual number of exercise leads',
       'Country/countries that learners are affilated with',
       'Career stage/ position of learners', 'Academic discipline',
       'Academic disciplines (if multiple teams from different disciplines)',
       'Registration date', 'Registration state'],
      dtype='object')

Sort and export to csv inside repo

In [7]:
individualDF_org.sort_index(inplace=True)
teamsDF_org.sort_index(inplace=True)

```python
individualDF_org.to_csv('./CR_individual.csv')
teamsDF_org.to_csv('./CR_teams.csv')
```

## Individual registrations
<a id =1> </a>

Read in the csv from the repo

In [9]:
individualDF = pd.read_csv("./CR_individual.csv", index_col="ID")

Some summary

In [14]:
individualDF.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 204 entries, 16 to 252
Data columns (total 10 columns):
 #   Column                                                   Non-Null Count  Dtype 
---  ------                                                   --------------  ----- 
 0   Country                                                  204 non-null    object
 1   Affiliation or university                                204 non-null    object
 2   Academic discipline                                      186 non-null    object
 3   Career stage / position                                  186 non-null    object
 4   How did you find out about this workshop?                183 non-null    object
 5   Attendance/participation type                            204 non-null    object
 6   Stream only or also exercise group? Video or in-person?  204 non-null    object
 7   Which days you plan to attend?                           204 non-null    object
 8   Registration date                      

A closer look

In [16]:
individualDF.head(2)

Unnamed: 0_level_0,Country,Affiliation or university,Academic discipline,Career stage / position,How did you find out about this workshop?,Attendance/participation type,Stream only or also exercise group? Video or in-person?,Which days you plan to attend?,Registration date,Registration state
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
16,Norway,UiT The Arctic University of Norway,Chemical Sciences,Research software engineer,Another event,Organizer,I am an organizer or instructor or observer,"Tue, Sep 20 (Day 1); Wed, Sep 21 (Day 2); Thu,...",2022-07-04 07:35:21.871298+00:00,Completed
17,Sweden,Chalmers University of Technology,Mechanical Engineering,Researcher,CodeRefinery.org; Educational institute,Learner,Watching stream and participate in online exer...,"Tue, Sep 20 (Day 1); Wed, Sep 21 (Day 2); Thu,...",2022-07-04 08:57:15.110384+00:00,Completed


Count some values

In [31]:
countCoutries = individualDF["Country"].value_counts()
countOrganisations = individualDF["Affiliation or university"].value_counts()
countAcademic = individualDF["Academic discipline"].value_counts()
countCareer = individualDF["Career stage / position"].value_counts()
countFindOut = individualDF["How did you find out about this workshop?"].value_counts()
countAttendaceType1 = individualDF["Attendance/participation type"].value_counts()
countAttendaceType2 = individualDF["Stream only or also exercise group? Video or in-person?"].value_counts()
countAttendaceDays = individualDF["Which days you plan to attend?"].value_counts()

### Country

This can be plotted right away

In [41]:
countCoutries

Finland              61
Sweden               50
Norway               38
Spain                12
Denmark               7
Netherlands           5
India                 4
Ireland               3
Switzerland           3
United Kingdom        3
Italy                 3
France                2
United States         2
Trinidad & Tobago     1
Singapore             1
China                 1
Thailand              1
Greece                1
Iran                  1
Turkey                1
Brazil                1
Indonesia             1
Germany               1
Canada                1
Name: Country, dtype: int64

### Affiliation or university

This has some names with slight variations so needs handling

In [42]:
countOrganisations

Aalto University                                         32
NTNU                                                     10
Uppsala University                                        9
SMHI                                                      9
University of Oslo                                        6
                                                         ..
Norwegian University of Science and Technology (NTNU)     1
Justervesenet-Norwegian Metrology Service                 1
University of South Eastern Norway                        1
Technical University of Chalmers                          1
UiO                                                       1
Name: Affiliation or university, Length: 107, dtype: int64

### Academic discipline

This can be plotted right away

In [44]:
countAcademic

Computer and Information Sciences                                          31
Physical Sciences                                                          28
Biological Sciences                                                        22
Chemical Sciences                                                          18
Earth and Related Environmental Sciences                                   15
Electrical Engineering, Electronic Engineering, Information Engineering    14
Mechanical Engineering                                                     13
Civil Engineering                                                           8
Health Sciences                                                             6
Psychology                                                                  4
Other Engineering and Technologies                                          4
Mathematics                                                                 4
Medical Engineering                                             

### Career stage / position

This can be plotted right away

In [46]:
countCareer

Graduate student              72
Researcher                    37
Postdoc                       26
Research software engineer    17
Other                         12
Professor                      9
Industry                       8
Undergrad. student             5
Name: Career stage / position, dtype: int64

### How did you find out about this workshop?

This needs some text analysis

In [48]:
countFindOut

Friend / Colleague                                                                   67
Educational institute                                                                34
CodeRefinery.org                                                                     19
National HPC center                                                                  18
Another event                                                                        15
Twitter                                                                              15
CodeRefinery.org; Educational institute                                               2
Educational institute; Friend / Colleague                                             2
CodeRefinery.org; National HPC center; Friend / Colleague                             1
Twitter; Another event                                                                1
Friend / Colleague; Another event                                                     1
Educational institute; National 

### Attendance/participation type

This can be plotted right away

In [50]:
countAttendaceType1

Learner            185
Instructor           6
Observer             5
Organizer            3
Expert helper        3
Exercise leader      2
Name: Attendance/participation type, dtype: int64

### Stream only or also exercise group? Video or in-person?

This can be plotted right away

In [52]:
countAttendaceType2

Only watching the stream                                                         90
Watching stream and participate in online exercise group (Code Refinery Zoom)    62
I would like to get more information and decide later                            29
I would like to watch and exercise with others in-person                         14
I am an organizer or instructor or observer                                       9
Name: Stream only or also exercise group? Video or in-person?, dtype: int64

### Which days you plan to attend?

This is tricky and potentially not worth it.
> It would be nice to know if the days have some differences. Probably we get that from the Twitch stats also

In [53]:
countAttendaceDays

Tue, Sep 20 (Day 1); Wed, Sep 21 (Day 2); Thu, Sep 22 (Day 3); Tue, Sep 27 (Day 4); Wed, Sep 28 (Day 5); Thu, Sep 29 (Day 6)                                                        117
Not sure I can attend, but please keep me informed                                                                                                                                   19
Tue, Sep 20 (Day 1); Wed, Sep 21 (Day 2); Thu, Sep 22 (Day 3); Tue, Sep 27 (Day 4); Wed, Sep 28 (Day 5); Thu, Sep 29 (Day 6); Not sure I can attend, but please keep me informed     11
Tue, Sep 27 (Day 4); Wed, Sep 28 (Day 5); Thu, Sep 29 (Day 6)                                                                                                                         8
Tue, Sep 20 (Day 1); Wed, Sep 21 (Day 2); Thu, Sep 22 (Day 3)                                                                                                                         6
Wed, Sep 21 (Day 2); Thu, Sep 22 (Day 3); Tue, Sep 27 (Day 4); Wed, Sep 28 (Day 