# #1: Identifying and Defining
## Data
    I am looking to analyse the data of YouTube videos, including performance data (likes, dislikes, views etc.) and categorical data (video topic, location data, channel category etc.)
## Goal
    The goal of this analysis will be to create a program that can take an input of, for example, a country, and display videos according to a rule, for example, top 100 performing videos. This program would also have the ability to display data of YouTube Shorts seperately, with the functionality to perform actions such as filtering out remixed videos.
## Source
    https://developers.google.com/youtube/v3
## Access
    This data is publicly available, courtesy of Google Developers, however, this API has a limit on the amount of requested units per day.
## Access Method
    I will access this data using the YouTube Data API, which provides all of the data needed.  
## Functional Requirements & Use Cases
-   Data Loading
    -   The program should load data as a .txt file of what the API supplies for a request. Errors in this method, such as an error in requesting this information, should be communicated to the user and noted for future fixes.
    -   The user will input the data they want to request through the GUI.
    -   The program will output a .txt file of this data to be used for the next steps.
    -   ***Use Case***  
        **Actor**: Program  
        **Goal**: To preload a dataset into the system.  
        **Preconditions**: An API key is supplied.  
        **Main Flow**:  
        Program requests data required from the YouTube API using an API key  
        The API returns the data requested  
        The program inpputs this data into a dataframe
        **Postconditions**: Dataset is loaded and ready for analysis.  
-   Data Cleaning
    -   The system need to handle issues such as videos or channels that have been privated, as well as needing to filter out things like remixed YouTube Shorts or certain genres of videos depending on user input.
    -   The user will input their filters of choice into the GUI (e.g. filter out remixes of Shorts).
    -   The program will output data as usual, but with the unwanted results omitted.
    -   ***Use Case***  
        **Actor**: User  
        **Goal**: To filter data from unwanted results.  
        **Preconditions**: User has filters set.  
        **Main Flow**:  
        User assigns the conditions for filtering data.  
        System validates the file format.  
        System loads the dataset and displays the information in a dataframe.  
        **Postconditions**: Dataset is filtered and ready for analysis.  
-   Data Analysis
    -   The program needs to allow for mean and mode analyses for functions such as average views per day and most popular video in a locale respectively.
    -   The user will input the value they would like through the GUI (e.g. most popular video in a locale).
    -   The program will output the desired value.
    -   ***Use Case***  
        **Actor**: User  
        **Goal**: To load a dataset into the system.  
        **Preconditions**: User has a dataset file ready.  
        **Main Flow**:  
        User places the dataset for reading into the correct folder.  
        System validates the file format.  
        System loads the dataset and displays the information in a dataframe.  
        **Postconditions**: Dataset is sorted and ready for analysis.  
-   Data Visualisation
    -   The program will visualise data as either a matplotlib chart or a pandas dataframe depending on the data.
    -   The user will input data wanted (e.g. views over last month).
    -   The program will output the appropriate visualisation (e.g. line graph of views over time).
    -   ***Use Case***  
        **Actor**: User  
        **Goal**: To load a dataset into the system.  
        **Preconditions**: User has a dataset file ready.  
        **Main Flow**:  
        User places the dataset for reading into the correct folder.  
        System validates the file format.  
        System loads the dataset and displays the information in a dataframe.  
        **Postconditions**: Dataset is visualised and ready for viewing.  
-   Data Reporting
    -   The system should include the output of only the requested data, which will be stored in a temporary .csv file that will be available to download for permanent use.
    -   The user will request data.
    -   The program will store it in a .csv file and provide a download link.
    -   ***Use Case***  
        **Actor**: User  
        **Goal**: To load a dataset into the system.  
        **Preconditions**: User has a dataset file ready.  
        **Main Flow**:  
        User places the dataset for reading into the correct folder.  
        System validates the file format.  
        System loads the dataset and displays the information in a dataframe.  
        **Postconditions**: Dataset is loaded and ready for download.  
## Non-Functional Requirements & Use Cases
-   Usability
    -   The program needs to have an easy to read and understandable GUI, which allows the user to request all data they want, as well as a README document that tells the user how to use the program.
    -   The user will press a button on the GUI.
    -   The program will perform whatever task the button was meant to perform.
-   Reliability
    -   The program needs to be able to reliably load different data sizes and types and represent them in the correct graph.
    -   The user will provide several different inputs.
    -   The program will provide several accurate outputs.

# Researching & Planning
## Chosen Issue
-   Purpose: I am trying to find out information about the top performing videos in different locales in an easy to understand manner. This is important as upcoming creators can use this data to find out information such as what genres perform best in a region or what format of video is most popular. This would allow more openings for content creators to make a career out of things they enjoy.
-   Missing Data: It's important to carry out this data analysis as this type of information is difficult, if not impossible, to find easily if you do not know how to use an API. This program would allow for much easier access to such information.
-   Stakeholders: Content creators would benefit from this type of information as they will be able to identify trends much earlier than they currently can and the general public will benefit from more access to a wider variety of content from various creators.
-   Use: The information I provide would be able to be used by creators to identify trends before they peak and be able to make higher quality videos without risking the trend dying before they can complete the video. This would allow the creators to earn more revenue from higher viewership due to a better video and viewers will benefit from the higher quality of content available to them.
## Privacy & Security
-   Data Privacy & Source: The YouTube API needs to protect data such as the names of the viewers of the videos, as well as the personal information of the creator. The API needs to protect the data of privated channels and videos, needing a special JSON file to access such data.
-   Application Data Privacy: The application will need to retain the API's data protection measures, as if the program does expose data such as privated videos or channels personal information could be exposed. The application, if pushed out to the public, will also need to keep the information of whoever is requesting the data private.
-   Cyber Security: An application that handles data should have user authentication, password hashing, and encryption. User authentication is when the program uses various methods to prove who the user accessing the program is. This can be achieved in my program via Google's OAuth service and the encrypted JSON file for accessing private data. Password hashing is when a password is put through a cipher to turn it into a string of numbers and symbols that cannot be understood by a human during transit. Encryption is when data is scrambled into a secret code that is only able to be unlocked using a special digital key. Password hashing is an example of this.

API KEY: AIzaSyB7eDZj9RJvxGP7kmFe7DY2Z9DAuPIXKBk