## __Identifying and Defining__

__Data:__ I’m looking to analyse data regarding the rating of Nintendo games over the last 20 years.\
__Goal:__ I want to find out if they’re is a major difference between critic and user ratings for games and to determine if there is bias from critics to certain genres or certain game series.\
__Source:__ https://www.kaggle.com/datasets/joebeachcapital/nintendo-games\
__Access:__ The data is publicly available on Kaggle.\
__Access Method:__ The data is contained within a csv file

### __Functional Requirements__

__Data Loading:__ The program must load a csv file and have a way to check that it is not being fed data from a different database or that it is not being sent data at all.\
__Description:__ Load the data from the csv file\
__Input:__ The Nintendo Dataset.\
__Output:__ The Nintendo Dataset is loaded into the program.

__Actor:__ User\
__Goal:__ To load the Nintendo Dataset into the system.\
__Preconditions:__ User has access to nintendo dataset\
__Main Flow:__
1. User places the dataset for reading into the correct folder.
1. System validates the file format.
1. System loads the dataset and displays the information in a dataframe.

__Postconditions:__ Dataset is loaded and ready for cleaning.

__Data Cleaning:__ The program must clean the data to remove, the unnecessary columns and rows that have missing data, to avoid errors and make data more readable\
__Description:__ Remove all unessassary and unhelpful information.\
__Input:__ The loaded Nintendo Dataset.\
__Output:__ A Cleaned useable dataset, ready for analysis.

__Actor:__ Programmer\
__Goal:__ To clean the nintendo data set to improve usability and accuracy.\
__Preconditions:__ User has access to nintendo dataset\
__Main Flow:__
1. The program removes columns with unnecessary information
1. Removes rows that are missing values
1. Removes IOS titles
1. Removes all third party titles

__Postconditions:__ Dataset is cleaned and ready for analysis.

__Data Analysis:__ The program should determine the average critic score (different genres), average user score (different genres), lowest rated game by critics (and the score it got from Users), lowest rated game by Users (and the score it got from Critics), Highest rated game by Critics (and the score it got from Users), Highest rated game by Users (and the score it got from Critics)\
__Description:__ averages in multiple categories are determined and compared.\
__Input:__ The Cleaned Nintendo Dataset.\
__Output:__ An Analysed dataset, ready to be visualised.

__Actor:__ User\
__Goal:__ To analyse the cleaned nintendo dataset.\
__Preconditions:__ User has access to cleaned nintendo dataset\
__Main Flow:__
        The system sorts the games by genre, then calculates the average score of each genre for critics and users
        Calculates the overall highest and lowest scores of critics and users
        Compares the two the scores they got from critics and vice versa

__Postconditions:__ Dataset is analysed and ready for display.

__Data Visualisation:__  Charts that compare values from Critics and Users, and tables that show this data as well.\
__Description:__ Analysed data is displayed to the User.\
__Input:__ The analysed Nintendo Dataset.\
__Output:__ A visual to convey the insight gathered from the analysis of the Nintendo Dataset.

__Actor:__ User\
__Goal:__ To visualise the analysed nintendo dataset.\
__Preconditions:__ User has access to analysed nintendo dataset\
__Main Flow:__
1. The program connects to matplotlib
1. Creates graphs comparing the scores from Critics and Users
1. Displays accompanying table with values.

__Postconditions:__ Dataset is visualised for the user.

__Data Reporting:__ The system should output the processed data into a csv file.\
__Description:__ prossessed data is stored in a folder on the users computer.
__Input:__ The Analysed Nintendo Dataset, and The Visualised Nintendo Dataset.
__Output:__ A folder containing all important information.

__Actor:__ User\
__Goal:__ To record the analysed and visualised nintendo dataset.\
__Preconditions:__ User has access to cleaned nintendo dataset\
__Main Flow:__
1. Create new folder to store all generated information
1. Place cleaned and analysed datasets into folder
1. Add matplotlib graph files as well.

__Postconditions:__ All files are stored in a new folder for the user.


### __Non-Functional Requirements__

__Usability:__ What is required from the User Interface and a 'README' document?
        The ‘README’ document must explain what a User has to do with the dataset to get the program working. The User Interface should be simple, clear and easy to understand, while providing a user with the expected output.

__Reliability:__ What is required from the system when providing information to the user on errors and ensuring data integrity?
        The system should have a fail safe that detects errors and tells the user when it does detect an error. The system should make two copies of the original dataset and only edit one to ensure no damage is dealt to the original dataset.


## __Researching and Planning__


__Purpose:__ The purpose of analysing the ratings of Nintendo games from Users and Critics over the past 20 years is to determine if there is any bias towards genre in either category and if Critics in general give higher ratings.

__Missing Data:__ Currently, the bias of critics/users is not shown and by discovering this potential bias, a greater understanding of the quality of video games can be found.

__Stakeholders:__ Consumers will benefit from this information as it will allow them to determine if a game is actually worth buying, in spite of reviews.

__Use:__ Consumers will be able to bias towards certain genres, and gain a deeper understanding of how higher ratings don’t necessarily indicate a better game and be educated on potential bias towards future games.

### __Privacy and Security__

- __Data Privacy of Source:__ Kaggle needs to protect the information on their site from potential data hacks to ensure data accuracy, and safety. Kaggle must also ensure that a user's data is not leaked while using the site and that download files are not malicious software.

- __Application Data Privacy:__ There is no personal data contained within the data I have sourced and there is no way to identify an individual from the data I am using alone. If I were to release this application to the general public I would have a responsibility to ensure that user data is not leaked while using the application and that the data is protected and cannot be tampered with.

- __Cyber Security:__ An Application should have secure data encryption (which means that the data is scrambled and changed so that it can not be read without a key) that is backed up on a regular basis to ensure that data cannot be corrupted. User authentication (the process in which a person is verified to be who they claim to be. This can be in the form of a password or pin) should be used to ensure that confidentiality and user data is protected. Password hashing (the process of turning a password into an unrecognisable script once entered) should also be used to prevent information being stolen in the event of a data breach.


## __My Dataframe__

|field|Datatype|Format of Display|Description|Example|Validation|
|---|---|---|---|---|---|
Title|Object|XX...XX|Title of game|Super Mario Odyssey|Can be any number of characters and can contain numbers, but can't contain 'Wave ' or 'Edition'|
Platform|Object|XX...XX|The Name of the Platform that the game released on|Switch|Must be 5 characters or shorter. Can contain numbers, but no special characters|
Meta_Score|float64|NN|The rating a game got from critics|84|Must only contain numbers and be 1-2 digits long|
User_Score|float64|N.N|The rating a game got from critics|8.4|Must be a decimal number to one place|
Genres|Object|['XX...XX','XX...XX']| The genre's of the game|['Action','Metroidvania']|Must be format of '['XX...XX','XX...XX'] and can't contain numbers'


In [3]:
import pandas as pd

nintendoGames_df = pd.read_csv('Data/NintendoGames.csv') #Loads Dataframe

nintendoGames_df.dropna(inplace=True)
nintendoGames_df.drop(columns=['link'], inplace=True)
nintendoGames_df.drop_duplicates(inplace=True)
nintendoGames_df

KeyError: "['iOS'] not found in axis"