##Situational Overview (Business Need) You are working for a wealthy investor that specialises in purchasing assets that are undervalued. This investor’s due diligence on all purchases includes a detailed analysis of the data that underlies the business, to try to understand the fundamentals of the business and especially to identify opportunities to drive profitability by changing the focus of which products or services are being offered.

The investor is interested in purchasing TellCo, an existing mobile service provider in the Republic of Pefkakia. TellCo’s current owners have been willing to share their financial information but have never employed anyone to look at their data that is generated automatically by their systems.

Your employer wants you to provide a report to analyse opportunities for growth and make a recommendation on whether TellCo is worth buying or selling. You will do this by analysing a telecommunication dataset that contains useful information about the customers & their activities on the network. You will deliver insights you managed to extract to your employer through an easy to use web based dashboard and a written report.

In [8]:
import numpy as np
import pandas as pd
pd.set_option('display.float_format', lambda x: '%.3f' % x)

In [9]:
df=pd.read_csv('../data/clean_data.csv')

In [10]:
df.head()

Unnamed: 0.1,Unnamed: 0,Bearer_Id,Dur. (ms),IMSI,MSISDN,IMEI,last_location,avg_rtt_dl,avg_rtt_ul,throughput_avg_dl_kpbs,...,youtube_ul_b,netflix_dl_b,netflix_ul_b,gaming_dl_b,gaming_ul_b,other_dl_b,other_ul_b,Total_ul_b,Total_dl_b,t_vol_ul_6250_37500B
0,11,13114483557479700480,889834.0,208201908306215.0,33664473872.0,86009102759374.0,T21335C,217.0,4.0,28305.0,...,16817598.0,18094847.0,2029991.0,504604946.0,1063672.0,526987098.0,11382619.0,33691738.0,535118044.0,8.0
1,12,13114483510574800896,850766.0,208200314328074.0,33603291937.0,35665009621983.0,D76026B,45.0,5.0,61.0,...,10610680.0,12189103.0,10621276.0,766292761.0,3655164.0,34550147.0,11326781.0,39654040.0,809144948.0,8.0
2,13,13042425955434700800,812507.0,208200314385130.0,33659219748.0,35573109931422.0,L20434C,45.0,5.0,0.0,...,21635303.0,18084649.0,9285105.0,530192847.0,10269993.0,235638196.0,2282987.0,48241438.0,576947969.0,8.0
3,14,13042425919178199040,777887.0,208200314385130.0,33659219748.0,35573109931422.0,L20434C,79.0,4.0,4148.0,...,3025236.0,14599101.0,11634512.0,251467133.0,3629272.0,40078751.0,10359946.0,32480400.0,298662697.0,8.0
4,15,7277825621540039680,780471.0,208201716888047.0,33753758738.0,35573009195619.0,T88606B,39.0,33.0,263.0,...,17842011.0,322630.0,5235123.0,26045464.0,16321224.0,137148601.0,15351908.0,58387271.0,42659750.0,8.0


In [7]:
df.shape

(149418, 53)

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 149418 entries, 0 to 149417
Data columns (total 53 columns):
Unnamed: 0                   149418 non-null int64
Bearer_Id                    149418 non-null object
Dur. (ms)                    149418 non-null float64
IMSI                         149418 non-null float64
MSISDN                       149418 non-null float64
IMEI                         149418 non-null float64
last_location                149418 non-null object
avg_rtt_dl                   149418 non-null float64
avg_rtt_ul                   149418 non-null float64
throughput_avg_dl_kpbs       149418 non-null float64
throughput_avg_ul_kpbs       149418 non-null float64
retrans_packets_dl_b         149418 non-null float64
retrans_packets_ul_b         149418 non-null float64
tp_dl_below_50kbps_pc        149418 non-null float64
tp_dl_50_250kbps_pc          149418 non-null float64
tp_dl_250kbps_1mbps_pc       149418 non-null float64
 tp_dl_above_1mbps_pc        149418 non-null 

In [13]:
df.describe(include=[np.number]).T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Unnamed: 0,149418.0,75049.416,43292.885,11.0,37597.25,75019.5,112557.75,149999.0
Dur. (ms),149418.0,104679.204,80524.212,7142.0,57675.75,86399.0,132574.75,1859336.0
IMSI,149418.0,208201639520456.8,21488995269.882,204047108489451.0,208201401263249.0,208201546329616.47,208201771619088.0,214074303349628.0
MSISDN,149418.0,41856267630.734,2443484442596.98,33601001722.0,33651278870.5,33663703740.0,33683436668.0,882397108489451.0
IMEI,149418.0,48474820835201.22,22416534351115.61,440015202000.0,35460708884556.5,35722009449082.0,86119704674978.5,99001201327774.0
avg_rtt_dl,149418.0,96.417,536.093,0.0,35.0,45.0,62.0,96923.0
avg_rtt_ul,149418.0,15.328,76.811,0.0,3.0,5.0,11.0,7120.0
throughput_avg_dl_kpbs,149418.0,13310.489,23993.952,0.0,43.0,63.0,19745.75,378160.0
throughput_avg_ul_kpbs,149418.0,1771.059,4626.62,0.0,47.0,63.0,1119.0,58613.0
retrans_packets_dl_b,149418.0,8906380.3,117365534.885,2.0,570226.5,570226.5,570226.5,4294425570.0


In [14]:
df.describe(include=[np.object]).T

Unnamed: 0,count,unique,top,freq
Bearer_Id,149418,134696,,430
last_location,149418,45538,D41377B,80
phone_company,149418,170,Apple,59565
phone_name,149418,1396,Huawei B528S-23A,19752


In [15]:
df['phone_company'].value_counts().T#look at the undefined fiels again

Apple                                 59565
Samsung                               40833
Huawei                                34422
undefined                              8983
Sony Mobile Communications Ab           980
                                      ...  
Quanta                                    1
Lephone                                   1
Alif Communications                       1
Civicom Technology (Hk) Co Limited        1
Beijing Shenqi Technology Co Ltd          1
Name: phone_company, Length: 170, dtype: int64

In [16]:
df['phone_name'].value_counts().T#look at the undefined fiels again

Huawei B528S-23A                              19752
Apple iPhone 6S (A1688)                        9419
Apple iPhone 6 (A1586)                         9023
undefined                                      8983
Apple iPhone 7 (A1778)                         6326
                                              ...  
Tct Mobile Suzho. Alcatel One Touch Pop C5        1
Tcl Communicatio. Alcatel A3                      1
Archos Sa Access 55 3G                            1
Lenovo Moto Z Ve12657645                          1
Tcl Communicatio. Alcatel Idol4 6055K             1
Name: phone_name, Length: 1396, dtype: int64

In [24]:
phones=df.groupby('phone_company').phone_name.agg('sum').T

In [25]:
phones.head(10)

phone_company
A-Link Telecom International Co Limited    A-Link Telecom I. Cubot X18 PlusA-Link Telecom...
ASUSTeK                                    Asustek Asus Ze550Kl Zenfone2 LaserAsustek Asu...
Acer                                       Acer Liquid Zest PlusAcer M310Acer Liquid Z220...
Adar                                                                Adar Stanley Mobile S231
Alif Communications                                            Alif Communicati. Pulian Cm84
Apple                                      Apple iPhone 6S (A1688)Apple iPhone 6S (A1688)...
Archos SA                                  Archos Sa Diamond SArchos Sa Archos 40 HeliumA...
Archos Sa                                  Archos Sa 50 SaphirArchos Sa 50 SaphirArchos S...
Asustek                                    Asustek Asus Zb555Kl Zenfone Max M1Asustek Asu...
Avenir Telecom                                    Avenir Telecom Pm550SAvenir Telecom Pm550S
Name: phone_name, dtype: object