# Project

The dataset was obtained from this [link](https://www.kaggle.com/artempozdniakov/ukrainian-market-mobile-phones-data).

The dataset set contains data about the mobile phones which were released in past few years and which can be bought in Ukraine.

### **Data Dictionary**

**ID** unique id for each row.

**brand_name**
The name of brand which manufactures the phone.

**model_name**
The name of phone's model.

**os**
The operating system of the phone.

**popularity**
The popularity of the phone in range 1-1224. 1224 is the most popular and 1 is least popular.

**best_price**
Best price of the price-range in Ukrainian hryvnias (UAH).

**lowest_price**
Lowest price of the price-range in Ukrainian hryvnias (UAH).

**highest_price**
Highest price of the price-range in Ukrainian hryvnias (UAH).

**sellers_amount**
The amount of sellers, who sale this phone.

**screen_size**
The size of phone's screen (inches).

**memory_size**
The size of phone's memory (GB).

**battery_size**
The size of phone's battery (mAh).

**release_date**
The year and month, when the phone was released.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
phone= pd.read_csv("phones_data.csv")

In [None]:
phone.shape

In [None]:
phone.head()

In [None]:
#rename the First colunm "Unnamed: 0" name to "ID"
phone.rename(columns={"Unnamed: 0":"ID"}, inplace=True)
phone.head(5)

In [None]:
#set the ID colunm as index
phone.set_index("ID", inplace=True)
phone[["os"]].head()

In [None]:
phone.dtypes

In [None]:
#changing release date to datetime
phone["release_date"] = pd.to_datetime(phone["release_date"])
#run this cell and run back again above cell to check type of date has changed

In [None]:
#create a new column of release year by extract the year from release date
phone["release_year"] = pd.DatetimeIndex(phone["release_date"]).year
phone["release_year"]

In [None]:
#to see data summary of numeric values columns
phone.describe()

In [None]:
#to see info of string value columns
phone.describe(include=['object'])

##  **Cleaning data**

In [None]:
#to see phone got how many missing values for each column
phone.isnull().sum()

In [None]:
phone["os"].value_counts(dropna=False) 

In [None]:
#drop the missing values
phone.dropna(subset=["os"], inplace=True)
#run above cell again to check NAN have been drop

In [None]:
#fill the missing values of lowest and highest with its columns' mean
phone.fillna(value={"lowest_price": phone["lowest_price"].mean()}, inplace=True)

In [None]:
phone.fillna(value={"highest_price": phone["highest_price"].mean()}, inplace=True)

In [None]:
phone

In [None]:
#round off both columns to 2 decimals place
phone.round({"lowest_price":2, "highest_price":2})

## **Use histogram to see the number of mobile phones release within the period of release date.**

In [None]:
phone["release_date"].value_counts().sort_index()

In [None]:
plt.hist(phone["release_date"])
plt.title("The Number of Phones Release For Each Year (2013-2021)")

## **To look at Phones' OS distribution by using Pie Chart**

In [None]:
#identify what data "os" column have
phone["os"].unique()

In [None]:
phone["os"].value_counts()

In [None]:
#to get the value for mainly Android and iOS while other will all put in Other_OS to 
#get more clearer pie chart
Android = phone[phone["os"]== "Android"].count()[2]
iOS = phone[phone["os"]== "iOS"].count()[2]
Other_OS = phone[(phone["os"]!= "Android") & (phone["os"]!="iOS")].count()[2]

Android, iOS, Other_OS

In [None]:
plt.pie([Android, iOS, Other_OS],
        labels=("Android", "iOS", "Other"),
        autopct= "%.2f%%", #2 decimals place with %
       counterclock=False, #to clockwise
       explode=(0.2,0,0)) #to slice Android part
plt.title("Phones' OS Distribution")

## **Use scatter plot to see the relationship between phones' popularity with number of sellers.**

In [None]:
phone.plot(kind="scatter", x="popularity", y="sellers_amount")
plt.title("Relationship between Phone's Popularity with Number of Phone Sellers")
#Not much related but still can see few positive relationship between this two variables

## **Create new price columns with MYR currency by using lambda function or function.**

In [None]:
#create new columns for price in MYR, exchange rate 1 UAH= 0.16 MYR
phone["best_price_myr"]= phone["best_price"].apply(lambda x: round(x*0.16, 2))
phone["lowest_price_myr"]= phone["lowest_price"].apply(lambda x: round(x*0.16, 2))
phone["highest_price_myr"]= phone["highest_price"].apply(lambda x: round(x*0.16, 2))

In [None]:
phone["best_price_myr"]

In [None]:
phone.head(5)

## Build a class which consist both functions:

**1. user input function to display diagram from above cells**

**2. user input function to let the user know the minimum budget they should have for specific phone brand**

In [None]:
#calculate the mean of best price for each phone brand
round(phone.groupby(["brand_name"])["best_price_myr"].mean(),2)

In [None]:
#change the result of above cells into a dictionary
brand_budget = dict(round(phone.groupby(["brand_name"])["best_price_myr"].mean(),2))
brand_budget

In [None]:
class Phone_Data(object):
      
    def __init__(self):
        self
        
    #build a function to show the diagrams from above cells     
    def phone_diagram(self):
        command= input("Please enter:" 
                       "\n 1 for Histogram "
                      "\n 2 for Pie chart "
                      "\n 3 for Scatter plot \n")
        
        if command == "1":
            plt.title("The Number of Phones Release For Each Year (2013-2021)")
            plt.hist(phone["release_date"])
            
        elif command == "2":
            plt.title("Phones' OS Distribution")
            plt.pie([Android, iOS, Other_OS],
                   labels=("Android", "iOS", "Other"),
                   autopct= "%.2f%%",
                   counterclock=False,
                   explode=(0.2,0,0))
            
        elif command == "3":
            plt.title("Relationship between Popularity with Number of "
                      "Phone Sellers")
            plt.scatter(x=phone["popularity"], y=phone["sellers_amount"])
            
        else:
            print("Please enter a valid option as provided above.")
        
    
    #a function to let user choose the brand and return the budget(calculate from 
    #mean phone[best_price_myr]) they may need to have for buying phone of that brand
    def phone_budget(self, brand_budget):
        self.brand_budget = brand_budget
        budget = input("Please input the number correspond to the brand of phone you targeted: "
                      "\n 1 for Apple "
                      "\n 2 for Samsung "
                      "\n 3 for Realme"
                      "\n 4 for Asus"
                      "\n 5 for Lenovo"
                      "\n 6 for Google \n")
        
        if budget == "1":
            brand="Apple"

        elif budget == "2":
            brand="Samsung"
           
        elif budget == "3":
            brand="realme"
            
        elif budget == "4":
            brand="ASUS"
           
        elif budget == "5":
            brand="Lenovo"
           
        elif budget == "6":
            brand="Google"
            
        else:
            print("Please enter a valid option as provided above.")
            
        if brand in brand_budget:
            print(f"You should have minimum RM {brand_budget.get(brand)} budget for {brand} brand mobile phone.")

In [None]:
data = Phone_Data()

In [None]:
data.phone_diagram()

In [None]:
data.phone_budget(brand_budget)

That is all. Thank you.