# Identifying and Defining

* __Data:__ Phone Prices
* __Goal:__ To analyse what is the best value phone of 2023.
* __Source:__ https://www.kaggle.com/datasets/berkayeserr/phone-prices
* __Access:__ publicly available
* __Access Method__: .csv file

# Functional Requirements
***

* __Data Loading__
    * Description: Have the pandas dataframe draw information from a given csv file
    * Input: .csv file
    * Output: System loads dataset and displays info in a dataframe

* __Use Case__
    * Actor: User
    * Goal: To load a dataset into the system.
    * Preconditions: User has a dataset file ready.
    * Main Flow:
        1. User places the dataset for reading into the correct folder.
        2. System validates the file format.
        3. System loads the dataset and displays the information in a dataframe.
    
    * Postconditions: Dataset is loaded and ready for the cleaning process

***

* __Data Cleaning__
    * Description: Removes invalid/incomplete data from the dataset and sorts/groups data (eg. brand, year, price) for whichever the user inputs. 
    * Input: 
        * Loaded Dataset from Data Loading Process
        * Specified criteria for sorting/grouping
    * Output: Filtered dataset ready for analysis

* __Use Case__
    * Actor: User
    * Goal: To remove invalid and sort/group data
    * Preconditions: User has loaded a dataset
    * Main flow:
        1. User decides how they would like to sort the data
        2. System sorts the data based on given criterias
    * Postconditions: Invalid data has been removed and dataset is now sorted into groups

*** 

* __Data Analysis__
    * Description: Statistical analysis of mean, median, mode and range
    * Input: 
        * Filtered dataset from Data Cleaning Process
        * Specified criteria for generating the mean, median, mode and range
    * Output: Statistics of mean, median, mode for specified criterias

* __Use Case__
    * Actor: User
    * Goal: To obtain a mean, median, mode, or range for specified criterias
    * Preconditions: User has gone through the data cleaning process
    * Main flow:
        1. User selects what data they want to use
        2. User selects among mean, median, mode, or range for the data they would like
        3. System obtains whatever the user chooses
    * Postconditions: The mean, median, mode, or range for specfied criterias has been obtained

***

* __Data Visualisation__
    * Description: Have the data be visualised in the form of Pandas Dataframes and Matplotlib charts.
    * Input: Given statistics from Data Analysis process
    * Output: Generate visual statistics from Data Analysis process in the form of Pandas Dataframes and different Matplotlib charts.

* __Use Case__
    * Actor: User
    * Goal: To visualise given data
    * Preconditions: User has gone through the Data Analysis Process
    * Main flow:
        1. User selects how they would like the data to be visualised
        2. System visualises the data according to given criteria
    * Postconditions: Data has been visualised and is ready for the reporting process
***

* __Data Reporting__
    * Description: Stores final dataset in a .csv file.
    * Input: Statistics from Data Analysis process
    * Output: Store Data Analysis statistics in a .csv file

* __Use Cases__
    * Actor: User
    * Goal: To store final dataset into a .csv file
    * Preconditions: User has gone through the data analysis process
    * Main flow:
        1. User selects whether they would like the final dataset to be stored
        2. System stores the final dataset into a .csv file
    * Postconditions: Final dataset has been stored into a .csv file
***

# Non-Functional Requirements
***

* __Usability__ 
    * The User Interface (UI) should be designed with simplicity, ease of use and must be intuitive. It should also contain clear instructions on how to proceed and use the UI. In addition, the UI must not break upon an invalid input, but rather give further instructions on how to proceed if the given input is invalid.
    * The README Document should include detailed instructions on how to use the system and also how to troubleshoot the program if needed (potential issues and solutions)
* __Reliability__
    * Error Handling: If an error occurs, the system should provide clear messages to describe the error and how it can be fixed or corrected. Eg. if a user types a string instead of an integer for an input, it should guide the user into inputting the desired string input. 
    * The system must validate the data inputs before processing. This includes checking the data appeals to the specified formats and specifications of whats meant to be in that column. An example of this being that the "announcement_date" is formatted correctly (YYYY-MM-DD)
    * To ensure data accuracy, if the user adds any desired data, it should create a seperate duplicate dataset in order to not fumble with the accuracy of the original dataset.
***

# Research and Planning

__Research of Chosen Issue__

*Purpose:* The purpose of this research is to determine the best value phone of 2023 by analyzing the affordability of phones based on their specifications. The analysis will focus on identifying which phones offer the best balance between price and features, and where the cost is being allocated in relation to production expenses.

*Missing Data:* It is necessary to carry out this analysis as the raw data does only gives the raw values and specifications which does not provide any value in regards to highlighting the purpose. 

*Stakeholders + Use:* Consumers will benefit significantly from this analysis as it will help them understand where their money is going and whether the money from their wallets is getting utilised at the best value. This information will empower consumers to make informed decisions, ensuring they are not overspending on features they don’t need while maximizing the utility of their purchase. Additionally, this research will help consumers assess the value of specific phone features and determine whether they are worth the additional cost, considering their personal usage needs.


__Background Research__

_What makes a phone affordable?_

* Production:
    * Lower production costs
    * Cheaper materials
    * Production in countries with lower labour costs
    * Implementing of lower profit margins
    * Less high-tech features and specifications 

* Common specifications for Affordable Phones:
    * 6.1 to 6.2-inch screens
    * Often use LCD panels, however OLEDis becoming increasingly common
    * Refresh rates are typically 60hz-120hz
    * Mid-range chipsets from Qualcomm, MediaTek, or Samsung
    * 4gb - 8gb of RAM
    * 64gb - 128gb of internal storage
    * Multiple rear cameras, front facing selfie camera
    * Battery capacities range from 4000mAh - 5000mAh
    * Normally Android, not as much software updates in contrast to flagship phones
    * 4G LTE, with 5G becoming increasingly common

__Privacy & Security__

* Data Privacy of Source:
    * The source of the phone data is from kaggle, but the data could have potentially come from manufacturers or data analysts. The sources are responsible for ensuring that this data does not contain any information that impedes any business's, participant's or consumers privacy. However, this data is not engaging with any personal data, but rather focusing on various phone models, prices, release dates and specifications and as a result, any consumer's privacy is unable to be impeded.

* Application Data Privacy:
    * While the dataset does not engage with any personal data it still remains important to consider privacy from the perspective of the organizations or brands involved. If the dataset includes sensitive information about companies or proprietary details, this must be protected. If the dataset is expanded to include any personal data, this nust be anonymised.
    * If I were to push this application to the general public, it would be my responsibility to ensure that any personal data collected is securely stored and processed or anonymised

* Cyber Security:
    * Input Validation: All user input should be validated to prevent injection attacks and other malicious inputs. This includes sanitizing data and checking for proper formatting.
    * Access Control: Implement proper authorization mechanisms to ensure users can only access resources they're permitted to.
    * Secure Communication: Use HTTPS to encrypt data in transit between the client and server. This protects sensitive information from interception.
    * Error Handling: Implement proper error handling to avoid exposing sensitive information through error messages.
    * Regular Updates: Keep all software components, including third-party libraries, up-to-date to patch known vulnerabilities.
    * Web Application Firewall (WAF): Employ a WAF to filter and monitor HTTP traffic between the web application and the Internet.
    * User Authentication: User authentication is the process of verifying the identity of a user attempting to access a system or application. It typically involves:
        * Requesting credentials (e.g., username and password)
        * Verifying these credentials against stored information
        * Granting or denying access based on the verification result
        * Strong authentication often includes multi-factor authentication (MFA) for an additional layer of security.
    * Password Hashing: Password hashing is a security technique used to protect stored passwords. It involves:
        * Converting passwords into fixed-length strings of characters
        * Using a one-way cryptographic function, making it computationally infeasible to reverse
        * Storing the hash instead of the plain text password
    * Encryption: Encryption is the process of encoding information to protect its confidentiality. In web applications, encryption is crucial for:
        * Protecting data in transit (using HTTPS)
        * Securing sensitive data at rest (in databases or file systems)


__Data Dictionaries__

|  Field  | Datatype | Format for Display | Description | Example | Validation |
|:--------|:--------|:--------|:--------|:--------|:--------|
|phone_name         |object     |XX...XX NNN       |Name of phone model                    |iPhone 14    |Can be any amount of characters and can include numbers          |
|brand              |object     |XX...XX            |Brand of phone                         |Apple        |Can be any amount of characters but not include numbers          |
|os                 |object     |XX...NN            |Operating System version of phone      |iOS 16       |Can be any amount of characters and can include numbers          |
|resolution         |object     |NNNN x NNNN        |Phone screen resolution                |1170x2532    |Must be 2 sets of numbers with 'x' in between                    |
|battery            |float      |NNNN               |Battery Capacity (mAh)                 |3279         |Must be 4 digit number without decimals                          |
|battery_type       |object     |XX...XX            |Type of battery                        |Li-Ton       |Can be any amount of characters but can not include numbers      |
|ram(GB)            |integer64  |N                  |Amount of ram in GB                    |6            |Must be a number without any decimals                            | 
|announcement_date  |datetime64 |YYYY-MM-DD         |Date of when phone was announced       |2022-09-07   |Must be in the format of YYYY-MM-DD, cannot be in reverse order  |
|weight(g)          |float      |N.N                |Weight of phone                        |172.0        |Must be decimal number to 1 decimal place                        |
|storage(GB)        |integer64  |N                  |Phone internal storage size            |128          |Must be a number without any decimals                            |
|video_720p         |boolean    |T or F             |Whether phone supports 720p video      |True         |Must be either "true" or "false"                                 |
|video_1080p        |boolean    |T or F             |Whether phone supports 1080p video     |True         |Must be either "true" or "false"                                 |
|video_4k           |boolean    |T or F             |Whether phone supports 4k video        |True         |Must be either "true" or "false"                                 |
|video_8k           |boolean    |T or F             |Whether phone supports 8k video        |False        |Must be either "true" or "false"                                 |
|video_30fps        |boolean    |T or F             |Whether phone supports 30fps video     |True         |Must be either "true" or "false"                                 |
|video_60fps        |boolean    |T or F             |Whether phone supports 60fps video     |True         |Must be either "true" or "false"                                 |
|video_120fps       |boolean    |T or F             |Whether phone supports 120fps video    |True         |Must be either "true" or "false"                                 |
|video_240fps       |boolean    |T or F             |Whether phone supports 240fps video    |True         |Must be either "true" or "false"                                 |
|video_480fps       |boolean    |T or F             |Whether phone supports 480fps video    |False        |Must be either "true" or "false"                                 |
|video_960fps       |boolean    |T or F             |Whether phone supports 960fps video    |False        |Must be either "true" or "false"                                 |
|price(USD)         |float64    |N.NNN              |Price of phone in USD                  |816          |Must be a decimal number with up to 3 decimal places             |


# DOCUMENTATION

_21/7/24:_

Began program by first importing the sample code and adjusting it to suit my program to see how everything works. Stumbled upon error when reading dataset (errno2) so for now switched it to read an absolute directory.

Data visualisation function was not plotting so attempted debugging it by printing something before and after trying to plot as well as adding an error message to understand what was wrong. Error message gave 'invalid numerical data'. Error ended up being with dataset as it contained fields in the first line, so after updating it would contain 2 rows of fields. Fixed by removing the fields within the dataset .csv file.

Also substituted the dataset with a copy that only contains 2 fields and 10 rows to ease testing and debugging process.

Added 'getAnalysis' function with the purpose of inputting whether the user would want to get the mean, median, mode, max or min. 
Added 'getStatistical_Data' function with the purpose of calculating mean, median, mode, max and min. However, statistical data was not printing so attempted to debug it. Error ended up being using the python statistics code "(print(statistics.mean([analysis_type])))" rather then the pandas variant "print(phone_df.mean(0,True,True))"

(CODE BELOW)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import sys
import statistics

quit = False

old_data_df = pd.read_csv('/Data/Darrell/Computing Tech/phone_data-analysis_assessment/data/phone_brand&price.csv')

phone_df = pd.read_csv('/Data/Darrell/Computing Tech/phone_data-analysis_assessment/data/phone_brand&price.csv',
                            header=None,
                            names=['brand', 'price'])
                            #names=['brand', 'price(USD)'])
                            #names=['phone_name', 'brand', 'os', 'inches', 'resolution', 'battery', 'battery_type', 'ram(GB)', 'announcement_date', 'weight(g)', 'storage(GB)', 'video_720p', 'video_1080p', 'video_4K', 'video_8K', 'video_30fps', 'video_60fps', 'video_120fps', 'video_240fps', 'video_480fps', 'video_960fps', 'price(USD)'])

def getOriginalData():
    print(old_data_df)

def getUpdatedData():
    print(phone_df)

def getCharts():
    print('before try')

    try:
        
        print('before plot')
        phone_df.plot(      
                        kind='bar',
                        x='brand', 
                        y='price(USD)', 
                        color='blue',
                        alpha=0.3,
                        title='Phone Prices by Brand')
        print('after plot')
        plt.show()
        print('after show')
    
    except Exception as err:
        print('error encountered:')
        print(f'caught {err=}, {type(err)=}')

def getAnalysis():
    analysis_type = input('What data would you like to use? ')

    if analysis_type == 'price':
        statistical_data = input("Would you like to see the mean, median, mode, max or min of the price? ")
        getStatistical_Data(analysis_type, statistical_data)

    else:
        print('Please enter a valid option')

def getStatistical_Data(analysis_type, statistical_data):
    
    #print('INSIDE STATISTICAL DATA')
    #print('ANALYSIS TYPE')
    #print(analysis_type)
    #print('STATISTICAL TYPE')
    #print(statistical_data)

    if statistical_data == 'mean':
        #print(statistics.mean([analysis_type]))
        print(phone_df.mean(0,True,True))
    elif statistical_data == 'median':
        #print(statistics.median([analysis_type]))
        print(phone_df.median(0,True,True))
    elif statistical_data == 'mode':
        #print(statistics.mode([analysis_type]))
        print(phone_df.mode(0,True))
    elif statistical_data == 'max':
        print(phone_df.max(0,True, True))
    elif statistical_data == 'min':
        print(phone_df.min(0, True, True))
    else:
        print('Please enter valid option')

def userOptions():
    global quit

    print("""Welcome to the Phone Prices Data Explorer!
          
    Please select an option:
    1 - Show the original dataset
    2 - Show the updated Data Frame
    3 - Analyse the Mean, Median, or Mode of given data
    4 - Visualise phone prices by brand
    5 - Quit Program
        """)
    
    try:
        choice = int(input('Enter Selection: '))

        if choice == 1:
            getOriginalData()
        elif choice == 2:
            getUpdatedData()
        elif choice == 3: 
            getAnalysis()
        elif choice == 4:
            getCharts()
        elif choice == 5:
            quit = True
        else:
            print('A number between 1 and 5, come on!')

    except Exception as err:
        print('error encountered:')
        print(f'caught {err=}, {type(err)=}')
        print('Enter a number, it is not that hard.')
   

while not quit:
    userOptions()