Lewis Francis C1826277

# Data Science Portfolio - Part I (30 marks)

In this question you will write Python code for processing, analyzing and understanding the social network **Reddit** (www.reddit.com). Reddit is a platform that allows users to upload posts and comment on them, and is divided in _subreddits_, often covering specific themes or areas of interest (for example, [world news](https://www.reddit.com/r/worldnews/), [ukpolitics](https://www.reddit.com/r/ukpolitics/) or [nintendo](https://www.reddit.com/r/nintendo)). You are provided with a subset of Reddit with posts from Covid-related subreddits (e.g., _CoronavirusUK_ or _NoNewNormal_), as well as randomly selected subreddits (e.g., _donaldtrump_ or _razer_).

The `csv` dataset you are provided contains one row per post, and has information about three entities: **posts**, **users** and **subreddits**. The column names are self-explanatory: columns starting with the prefix `user_` describe users, those starting with the prefix `subr_` describe subreddits, the `subreddit` column is the subreddit name, and the rest of the columns are post attributes (`author`, `posted_at`, `title` and post text - the `selftext` column-, number of comments - `num_comments`, `score`, etc.).

In this exercise, you are asked to perform a number of operations to gain insights from the data.

In [None]:
# suggested imports
import pandas as pd
from nltk.tag import pos_tag, pos_tag_sents
import re
from collections import defaultdict,Counter
from nltk.stem import WordNetLemmatizer
from datetime import datetime
from tqdm import tqdm
import numpy as np
import os
tqdm.pandas()
from ast import literal_eval
# nltk imports, note that these outputs may be different if you are using colab or local jupyter notebooks
import nltk
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize,sent_tokenize

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [None]:
from urllib import request
import pandas as pd
module_url = f"https://raw.githubusercontent.com/luisespinosaanke/cmt309-portfolio/master/data_portfolio_21.csv"
module_name = module_url.split('/')[-1]
print(f'Fetching {module_url}')
#with open("file_1.txt") as f1, open("file_2.txt") as f2
with request.urlopen(module_url) as f, open(module_name,'w') as outf:
  a = f.read()
  outf.write(a.decode('utf-8'))


df = pd.read_csv('data_portfolio_21.csv')
# this fills empty cells with empty strings
df = df.fillna('')

Fetching https://raw.githubusercontent.com/luisespinosaanke/cmt309-portfolio/master/data_portfolio_21.csv


In [None]:
df.head()

Unnamed: 0,author,posted_at,num_comments,score,selftext,subr_created_at,subr_description,subr_faved_by,subr_numb_members,subr_numb_posts,subreddit,title,total_awards_received,upvote_ratio,user_num_posts,user_registered_at,user_upvote_ratio
0,-Howitzer-,2020-08-17 20:26:04,19,1,,2009-04-29,Subreddit about Donald Trump,"['vergil_never_cry', 'Jelegend', 'pianoyeah', ...",30053,796986,donaldtrump,BREAKING: Trump to begin hiding in mailboxes t...,0,1.0,4661,2012-11-09,-0.658599
1,-Howitzer-,2020-07-06 17:01:48,1,3,,2009-04-29,Subreddit about Donald Trump,"['vergil_never_cry', 'Jelegend', 'pianoyeah', ...",30053,796986,donaldtrump,Joe Biden's America,0,0.67,4661,2012-11-09,-0.658599
2,-Howitzer-,2020-09-09 02:29:02,3,1,,2009-04-29,Subreddit about Donald Trump,"['vergil_never_cry', 'Jelegend', 'pianoyeah', ...",30053,796986,donaldtrump,4 more years and we can erase his legacy for g...,0,1.0,4661,2012-11-09,-0.658599
3,-Howitzer-,2020-06-23 23:02:39,2,1,,2009-04-29,Subreddit about Donald Trump,"['vergil_never_cry', 'Jelegend', 'pianoyeah', ...",30053,796986,donaldtrump,Revelation 9:6 [Transhumanism: The New Religio...,0,1.0,4661,2012-11-09,-0.658599
4,-Howitzer-,2020-08-07 04:13:53,32,622,,2009-04-29,Subreddit about Donald Trump,"['vergil_never_cry', 'Jelegend', 'pianoyeah', ...",30053,796986,donaldtrump,"LOOK HERE, FAT",0,0.88,4661,2012-11-09,-0.658599


## P1.1 - Text data processing (10 marks)

### P1.1.1 - Faved by as lists (3 marks)

The column `subr_faved_by` contains an array of values (names of redditors who added the subreddit to which the current post was submitted), but unfortunately they are in text format, and you would not be able to process them properly without converting them to a suitable python type. You must convert these string values to Python lists, going from

```python
'["user1", "user2" ... ]'
```

to

```python
["user1", "user2" ... ]
```

**What to implement:** Implement a function `transform_faves(df)` which takes as input the original dataframe and returns the same dataframe, but with one additional column called `subr_faved_by_as_list`, where you have the same information as in `subr_faved_by`, but as a python list instead of a string.

In [None]:
def transform_faves(df):
  myList = []
  dataMatching = df["subr_faved_by"]
  count_row = dataMatching.shape[0]

  for i in range(0,count_row):
    selectedRow = dataMatching.iloc[i].replace("[","").replace("]","").replace("'","")
    splitRows = selectedRow.split(",")
    myList.append(splitRows)

  df2 = pd.DataFrame({"subr_faved_by_as_list":myList})
  df2.index = df.index
  df = pd.concat([df, df2],  axis=1)


  return df
df = transform_faves(df)

In [None]:
df

Unnamed: 0,author,posted_at,num_comments,score,selftext,subr_created_at,subr_description,subr_faved_by,subr_numb_members,subr_numb_posts,subreddit,title,total_awards_received,upvote_ratio,user_num_posts,user_registered_at,user_upvote_ratio,subr_faved_by_as_list
0,-Howitzer-,2020-08-17 20:26:04,19,1,,2009-04-29,Subreddit about Donald Trump,"['vergil_never_cry', 'Jelegend', 'pianoyeah', ...",30053,796986,donaldtrump,BREAKING: Trump to begin hiding in mailboxes t...,0,1.00,4661,2012-11-09,-0.658599,"[vergil_never_cry, Jelegend, pianoyeah, sal..."
1,-Howitzer-,2020-07-06 17:01:48,1,3,,2009-04-29,Subreddit about Donald Trump,"['vergil_never_cry', 'Jelegend', 'pianoyeah', ...",30053,796986,donaldtrump,Joe Biden's America,0,0.67,4661,2012-11-09,-0.658599,"[vergil_never_cry, Jelegend, pianoyeah, sal..."
2,-Howitzer-,2020-09-09 02:29:02,3,1,,2009-04-29,Subreddit about Donald Trump,"['vergil_never_cry', 'Jelegend', 'pianoyeah', ...",30053,796986,donaldtrump,4 more years and we can erase his legacy for g...,0,1.00,4661,2012-11-09,-0.658599,"[vergil_never_cry, Jelegend, pianoyeah, sal..."
3,-Howitzer-,2020-06-23 23:02:39,2,1,,2009-04-29,Subreddit about Donald Trump,"['vergil_never_cry', 'Jelegend', 'pianoyeah', ...",30053,796986,donaldtrump,Revelation 9:6 [Transhumanism: The New Religio...,0,1.00,4661,2012-11-09,-0.658599,"[vergil_never_cry, Jelegend, pianoyeah, sal..."
4,-Howitzer-,2020-08-07 04:13:53,32,622,,2009-04-29,Subreddit about Donald Trump,"['vergil_never_cry', 'Jelegend', 'pianoyeah', ...",30053,796986,donaldtrump,"LOOK HERE, FAT",0,0.88,4661,2012-11-09,-0.658599,"[vergil_never_cry, Jelegend, pianoyeah, sal..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19935,zqrwiel,2020-07-23 16:39:15,11,246,,2009-04-13,A subreddit dedicated to the discussion of hip...,"['solex125', 'redreddington22', 'HibikiSS', 'k...",8740,630857,playboicarti,carti why,0,1.00,1883,2014-02-12,0.861626,"[solex125, redreddington22, HibikiSS, klond..."
19936,zqrwiel,2020-12-15 11:25:07,39,1,"Then I think we might get 18 songs, outro usua...",2009-04-13,A subreddit dedicated to the discussion of hip...,"['solex125', 'redreddington22', 'HibikiSS', 'k...",8740,630857,playboicarti,If uzi on track 3 and 16,0,1.00,1883,2014-02-12,0.861626,"[solex125, redreddington22, HibikiSS, klond..."
19937,zqrwiel,2020-12-27 13:57:49,15,1,He has 25songs to perform plus the additional ...,2009-04-13,A subreddit dedicated to the discussion of hip...,"['solex125', 'redreddington22', 'HibikiSS', 'k...",8740,630857,playboicarti,Man carti’s concerts are gonna be long af,0,1.00,1883,2014-02-12,0.861626,"[solex125, redreddington22, HibikiSS, klond..."
19938,zqrwiel,2020-12-29 12:07:10,6,1,I got goose[***]ps just by thinking about it 😬,2009-04-13,A subreddit dedicated to the discussion of hip...,"['solex125', 'redreddington22', 'HibikiSS', 'k...",8740,630857,playboicarti,Can’t wait to see Carti going full rage mode o...,0,1.00,1883,2014-02-12,0.861626,"[solex125, redreddington22, HibikiSS, klond..."


### P1.1.2 - Merge titles and text bodies (4 marks)

All Reddit posts need to have a title, but a text body is optional. However, we want to be able to access all free text information for each post without having to look at two columns every time.

**What to implement**: A function `concat(df)` that will take as input the original dataframe and will return it with an additional column called `full_text`, which will concatenate `title` and `selftext` columns, but with the following restrictions:

- 1) Wrap the title between `<title>` and `</title>` tags.
- 2) Add a new line (`\n`) between title and selftext, but only in cases where you have both values (see instruction 4).
- 3) Wrap the selftext between `<selftext>` and `</selftext>`.
- 4) You **must not** include the tags in points (1) or (3) if the values for these columns is missing. We will consider a missing value either an empty value (empty string) or a string of only one character (e.g., an emoji). Also, the value of a `full_text` column must not end in the new line character.

In [None]:
def concat(df):
  finalList = []
  countRowConcat = df["title"].shape[0]
  for i in range(0,countRowConcat):
    if len(df["title"].iloc[i])>1:
      testerOne = " <title> " + df["title"][i] + " </title> "
      finalList.append(testerOne)
      if len(df["selftext"].iloc[i])>1:
        testerTwo = " \n " + " <selftext> " + df["selftext"][i] + " </selftext> "
        finalList[i] = finalList[i] + testerTwo

      
    if len(df["title"].iloc[i])<=1:
      finalList.append(df["title"][i])

  

  dfConcat = pd.DataFrame({"full_text":finalList})

  df = pd.concat([df, dfConcat],  axis=1)
  return df
df = concat(df)

### P1.1.3 - Enrich posts (3 marks)

We would like to augment our text data with linguistic information. To this end, we will _tokenize_, apply _part-of-speech tagging_, and then we will _lower case_ all the posts.

**What to implement**: A function `enrich_posts(df)` that will take as input the original dataframe and will return it with **two** additional columns: `enriched_title` and `enriched_selftext`. These columns will contain tokenized, pos-tagged and lower cased versions of the original text. **You must implement them in this order**, because the pos tagger uses casing information.

In [None]:
def enrich_posts(df):
    # your code here
    # LOOK INTO nltk functions!
    df['enriched_title'] = df['title'].apply(word_tokenize) 
    df['enriched_selftext'] = df['selftext'].apply(word_tokenize) 
    df['enriched_title'] = df['enriched_title'].apply(pos_tag)
    df['enriched_selftext'] = df['enriched_selftext'].apply(pos_tag)
    df['enriched_title'] = df['enriched_title'].astype(str) + "lowecase: " + df['title'].str.lower()
    df['enriched_selftext'] = df['enriched_selftext'].astype(str) + "lowecase: " + df['selftext'].str.lower()
    return df

df = enrich_posts(df)
df

Unnamed: 0,author,posted_at,num_comments,score,selftext,subr_created_at,subr_description,subr_faved_by,subr_numb_members,subr_numb_posts,...,title,total_awards_received,upvote_ratio,user_num_posts,user_registered_at,user_upvote_ratio,subr_faved_by_as_list,full_text,enriched_title,enriched_selftext
0,-Howitzer-,2020-08-17 20:26:04,19,1,,2009-04-29,Subreddit about Donald Trump,"['vergil_never_cry', 'Jelegend', 'pianoyeah', ...",30053,796986,...,BREAKING: Trump to begin hiding in mailboxes t...,0,1.00,4661,2012-11-09,-0.658599,"[vergil_never_cry, Jelegend, pianoyeah, sal...",<title> BREAKING: Trump to begin hiding in ma...,"[('BREAKING', 'NN'), (':', ':'), ('Trump', 'NN...",[]lowecase:
1,-Howitzer-,2020-07-06 17:01:48,1,3,,2009-04-29,Subreddit about Donald Trump,"['vergil_never_cry', 'Jelegend', 'pianoyeah', ...",30053,796986,...,Joe Biden's America,0,0.67,4661,2012-11-09,-0.658599,"[vergil_never_cry, Jelegend, pianoyeah, sal...",<title> Joe Biden's America </title>,"[('Joe', 'NNP'), ('Biden', 'NNP'), (""'s"", 'POS...",[]lowecase:
2,-Howitzer-,2020-09-09 02:29:02,3,1,,2009-04-29,Subreddit about Donald Trump,"['vergil_never_cry', 'Jelegend', 'pianoyeah', ...",30053,796986,...,4 more years and we can erase his legacy for g...,0,1.00,4661,2012-11-09,-0.658599,"[vergil_never_cry, Jelegend, pianoyeah, sal...",<title> 4 more years and we can erase his leg...,"[('4', 'CD'), ('more', 'JJR'), ('years', 'NNS'...",[]lowecase:
3,-Howitzer-,2020-06-23 23:02:39,2,1,,2009-04-29,Subreddit about Donald Trump,"['vergil_never_cry', 'Jelegend', 'pianoyeah', ...",30053,796986,...,Revelation 9:6 [Transhumanism: The New Religio...,0,1.00,4661,2012-11-09,-0.658599,"[vergil_never_cry, Jelegend, pianoyeah, sal...",<title> Revelation 9:6 [Transhumanism: The Ne...,"[('Revelation', 'NN'), ('9:6', 'CD'), ('[', 'J...",[]lowecase:
4,-Howitzer-,2020-08-07 04:13:53,32,622,,2009-04-29,Subreddit about Donald Trump,"['vergil_never_cry', 'Jelegend', 'pianoyeah', ...",30053,796986,...,"LOOK HERE, FAT",0,0.88,4661,2012-11-09,-0.658599,"[vergil_never_cry, Jelegend, pianoyeah, sal...","<title> LOOK HERE, FAT </title>","[('LOOK', 'NNP'), ('HERE', 'NNP'), (',', ','),...",[]lowecase:
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19935,zqrwiel,2020-07-23 16:39:15,11,246,,2009-04-13,A subreddit dedicated to the discussion of hip...,"['solex125', 'redreddington22', 'HibikiSS', 'k...",8740,630857,...,carti why,0,1.00,1883,2014-02-12,0.861626,"[solex125, redreddington22, HibikiSS, klond...",<title> carti why </title>,"[('carti', 'NN'), ('why', 'WRB')]lowecase: car...",[]lowecase:
19936,zqrwiel,2020-12-15 11:25:07,39,1,"Then I think we might get 18 songs, outro usua...",2009-04-13,A subreddit dedicated to the discussion of hip...,"['solex125', 'redreddington22', 'HibikiSS', 'k...",8740,630857,...,If uzi on track 3 and 16,0,1.00,1883,2014-02-12,0.861626,"[solex125, redreddington22, HibikiSS, klond...",<title> If uzi on track 3 and 16 </title> \n...,"[('If', 'IN'), ('uzi', 'JJ'), ('on', 'IN'), ('...","[('Then', 'RB'), ('I', 'PRP'), ('think', 'VBP'..."
19937,zqrwiel,2020-12-27 13:57:49,15,1,He has 25songs to perform plus the additional ...,2009-04-13,A subreddit dedicated to the discussion of hip...,"['solex125', 'redreddington22', 'HibikiSS', 'k...",8740,630857,...,Man carti’s concerts are gonna be long af,0,1.00,1883,2014-02-12,0.861626,"[solex125, redreddington22, HibikiSS, klond...",<title> Man carti’s concerts are gonna be lon...,"[('Man', 'NNP'), ('carti', 'VBZ'), ('’', 'JJ')...","[('He', 'PRP'), ('has', 'VBZ'), ('25songs', 'C..."
19938,zqrwiel,2020-12-29 12:07:10,6,1,I got goose[***]ps just by thinking about it 😬,2009-04-13,A subreddit dedicated to the discussion of hip...,"['solex125', 'redreddington22', 'HibikiSS', 'k...",8740,630857,...,Can’t wait to see Carti going full rage mode o...,0,1.00,1883,2014-02-12,0.861626,"[solex125, redreddington22, HibikiSS, klond...",<title> Can’t wait to see Carti going full ra...,"[('Can', 'MD'), ('’', 'VB'), ('t', 'JJ'), ('wa...","[('I', 'PRP'), ('got', 'VBD'), ('goose', 'JJ')..."


## P1.2 - Answering questions with pandas (12 marks)

In this question, your task is to use pandas to answer questions about the data.

### P1.2.1 - Users with best scores (3 marks)

- Find the users with the highest aggregate scores (over all their posts) for the whole dataset. You should restrict your results to only those whose aggregated score is above 10,000 points, in descending order. Your code should generate a dictionary of the form `{author:aggregated_scores ... }`.

In [None]:
userGroup = df.groupby(["author"])
scorePerUser = userGroup["score"].sum().sort_values(ascending=False)

filt = scorePerUser > 10000

filterUse = scorePerUser.loc[filt]
dfToDict = filterUse.to_dict()
print(dfToDict)

{'DaFunkJunkie': 250375, 'None': 218846, 'SUPERGUESSOUS': 211611, 'jigsawmap': 210824, 'chrisdh79': 143538, 'hildebrand_rarity': 122464, 'iSlingShlong': 118595, 'hilltopye': 81245, 'tefunka': 79560, 'OldFashionedJizz': 64398, 'JLBesq1981': 58235, 'rspix000': 57107, 'Wagamaga': 47989, 'stem12345679': 47455, 'TheJeck': 26058, 'TheGamerDanYT': 25357, 'TrumpSharted': 21154, 'NotsoPG': 18518, 'SonictheManhog': 18116, 'BlanketMage': 13677, 'NewAltWhoThis': 12771, 'kevinmrr': 11900, 'Dajakesta0624': 11613, 'apocalypticalley': 10382}


### P1.2.2 - Awarded posts (3 marks)

Find the number of posts that have received at least one award. Your query should return only one value.

In [None]:
awardSuccessDF = df[df.total_awards_received > 0]
countAwardSuccessDF = awardSuccessDF.shape[0]

print(countAwardSuccessDF)

119


### P1.2.3 Find Covid (3 marks)

Find the name and description of all subreddits where the name starts with `Covid` or `Corona` and the description contains `covid` or `Covid` anywhere. Your code should generate a dictionary of the form#

```python
  {'Coronavirus':'Place to discuss all things COVID-related',
  ...
  }
```

In [None]:
subredditCheckCorona =  df[df['subreddit'].str.contains(r'(?i)^corona(?!$)')]
subredditCheckCovid= df[df['subreddit'].str.contains(r'(?i)^covid(?!$)')]
mergeBothChecks = pd.concat([subredditCheckCovid,subredditCheckCorona])
subrDesCheck =  mergeBothChecks[mergeBothChecks['subr_description'].str.contains(r'(?i)covid(?!$)')]
covidDF = subrDesCheck[['subreddit','subr_description']]
covidFinderDict = covidDF.set_index('subreddit').to_dict()['subr_description']
covidFinderDict

{'COVID19': 'In December 2019, SARS-CoV-2, the virus causing the disease COVID-19, emerged in the city of Wuhan, China. This subreddit seeks to facilitate scientific discussion of this global public health threat.',
 'Coronavirus': 'Place to discuss all things COVID-related',
 'CoronavirusCA': 'Tracking the Coronavirus/Covid-19 outbreak in California',
 'CoronavirusDownunder': 'This subreddit is a place to share news, information, resources, and support that relate to the novel coronavirus SARS-CoV-2 and the disease it causes called COVID-19. The primary focus of this sub is to actively monitor the situation in Australia, but all posts on international news and other virus-related topics are welcome, to the extent they are beneficial in keeping those in Australia informed.',
 'CoronavirusUS': 'USA/Canada specific information on the coronavirus (SARS-CoV-2) that causes coronavirus disease 2019 (COVID-19)'}

### P1.2.4 - Redditors that favorite the most

Find the users that have favorited the largest number of subreddits. You must produce a pandas dataframe with **two** columns, with the following format:

```python
     redditor	    numb_favs
0	user1           7
1	user2           6
2	user3	       5
3	user4           4
...
```

where the first column is a Redditor username and the second column is the number of distinct subreddits he/she has favorited.

In [None]:
subrFavSeries = pd.Series(df['subr_faved_by_as_list'])
subrFavSeries = subrFavSeries.drop_duplicates(keep='first')


oneDList = [item for sublist in subrFavSeries for item in sublist]
users = list((pd.unique(oneDList)))
favourited =[]
for i in users:
  favourited.append(oneDList.count(i))

userDict = dict(zip(users,favourited))

sortUser = {x: y for x,y in sorted(userDict.items(), key = lambda i: i[1],reverse = True)}
df_redditors = pd.DataFrame(list(sortUser.items()), columns=["redditor","numb_faves"])
df_redditors

Unnamed: 0,redditor,numb_faves
0,magnusthered15,7
1,KarmaFury,6
2,FriendlyVegetable420,6
3,OmniusQubus,6
4,hmhmhm2,6
...,...,...
1645,certifiedloverboy69,1
1646,diveonfire,1
1647,mouthofreason,1
1648,Alexify,1


## P1.3 Ethics (8 marks)

**(updated on 16/03/2022)**

Imagine you are **the head of a data mining company that needs to use** the insights gained in this assignment to scan social media for covid-related content, and automatically flag it as conspiracy or not conspiracy (for example, for hiding potentially harmful tweets or Facebook posts). **Some information about the project and the team:**

 - Your client is a political party concerned about misinformation.
 - The project requires mining Facebook, Reddit and Instagram data.
 - The team consists of Joe, an American mathematician who just finished college; Fei, a senior software engineer from China; and Francisco, a data scientist from Spain.

Reflect on the impact of exploiting data science for such an application. You should map your discussion to one of the five actions outlined in the UK’s Data Ethics Framework. 

Your answer should address the following:

 - Identify the action **in which your project is the weakest**.
 - Then, justify your choice by critically analyzing the three key principles **for that action** outlined in the Framework, namely transparency, accountability and fairness.
 - Finally, you should propose one solution that explicitly addresses one point related to one of these three principles, reflecting on how your solution would improve the data cycle in this particular use case.

Your answer should be between 500 and 700 words. **You are strongly encouraged to follow a scholarly approach, e.g., with references to peer reviewed publications. References do not count towards the word limit.**

After researching into the UK’s data ethics framework I have found a major weakness with the project and its team, this weakness is 3. Comply with the law. This is a major weakness as not following this may lead to serious repercussions in the future with examples of this being fines of up to 17 million euros or 4% of the company's turnover (¹Wright, n.d.) if you do not follow the GDPR. The reason as to why I decided that this was the biggest weakness is mainly due to the fact that none of the team seem to be qualified to lead and follow these requirements, this is shown by the fact that none of the team specialize in a legal field, this is amplified by the fact that 1 out of the team of 3 have only just finished college and may be even less experienced in the legal field. To go more in depth into the principles within the framework we can see that 3 of the requirements involve consulting with a person with a legal background whether that be a legal advisor in your team, information assurance specialist group in your company or a data protection officer and from the information we have received it would be correct to assume that none of them have done this yet. There are also 3 major aspects of this framework to consider which are transparency, accountability and fairness. It is legally required that the team publish its DPIA as it has the opportunity to be a large risk (²Data protection impact assessments, n.d.) as it is using people's personal data and accounts to potentially block access to speech and therefore not doing this may lead to legal repercussions. As this project is being supported by a political party the fairness of the project may be called into question, it is important when handling and using information that there is no bias being shown especially when discriminitation may be a factor, as of the equality act of 2010 (³Equality Act 2010: guidance, 2013) it is illegal to discriminate against a group of people in both the workplace and in wider society. The final of the 3 major aspects of the framework is accountability, as I do not have enough information in regards to how well documented all of the work is it is difficult to see if they are fulfilling all of the requirements, however as I have previously said since the team does not have an information assurance specialist it would be safe to say that they will not be ensuring that the correct policies and training are in place and therefore may have problems ahead. In addition to this it would also be safe to assume that they will not be following article 30 of the GDPR, specifically subsections 1.(d), 1.(f) and 2.(a) (⁴Art. 30 GDPR – Records of processing activities - General Data Protection Regulation (GDPR), n.d.)  which are all to do with how the data they collected is stored and what data is stored. One crucial proposal of how to fix these issues is by recruiting a legal advisor and an information assurance specialist as this will assist in the majorities of the issues I have found since they will understand more about the legal requirements and how to fulfill them to the required degree.







Citations:
 
¹Wright, L., n.d. The severe ramifications of failing to comply with GDPR. [online] Core.co.uk. Available at: <https://www.core.co.uk/blog/blog/ramifications-failing-comply-with-gdpr> [Accessed 6 May 2022].
 
²Ico.org.uk. n.d. Data protection impact assessments. [online] Available at: <https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/accountability-and-governance/data-protection-impact-assessments/> [Accessed 6 May 2022]
 
³GOV.UK. 2013. Equality Act 2010: guidance. [online] Available at: <https://www.gov.uk/guidance/equality-act-2010-guidance> [Accessed 6 May 2022].
 
⁴General Data Protection Regulation (GDPR). n.d. Art. 30 GDPR – Records of processing activities - General Data Protection Regulation (GDPR). [online] Available at: <https://gdpr-info.eu/art-30-gdpr/> [Accessed 6 May 2022].

