# Project #1

#### Project overview

In this project, you will complete five tasks based on a fabricated set of calls and texts exchanged during September 2016. You will use Python to analyze and answer questions about the texts and calls contained in the dataset. Lastly, you will perform run time analysis of your solution and determine its efficiency.

#### What will I learn?
In this project, you will:

* Apply your Python knowledge to breakdown problems into their inputs and outputs.
* Perform an efficiency analysis of your solution.
* Warm up your Python skills for the course.

#### Why this Project?
You'll apply the skills you've learned so far in a more realistic scenario. The five tasks are structured to give you experience with a variety of programming problems. You will receive code review of your work; this personal feedback will help you to improve your solutions.

## I. Solve given tasks

### Task 0

In [1]:
import csv
with open('texts.csv', 'r') as f:
    reader = csv.reader(f)
    texts = list(reader)

with open('calls.csv', 'r') as f:
    reader = csv.reader(f)
    calls = list(reader)

In [None]:
#texts

In [None]:
#calls

**Problem:**

What is the first record of texts and what is the last record of calls?

Print messages:
* "First record of texts, *incoming number* texts *answering number* at time *time*"
* "Last record of calls, *incoming number* calls *answering number* at time *time*, lasting *during* seconds"

#### Solution

In [None]:
# Add print statements
print("First record of texts, {} texts {} at time {}".format(texts[0][0],
                                                             texts[0][1],
                                                             texts[0][2]))
print("Last record of calls, {} calls {} at time {}, lasting {} seconds".format(calls[-1][0],
                                                                                calls[-1][1],
                                                                                calls[-1][2],
                                                                                calls[-1][3]))

#### Runtime: O(1)

Array access via index has 0(1) time complexity. In this case, we have two accesses of arrays. Threfore, the time complexity is O(1+1), which is roughly, O(1).

### Task 1

**Problem**:

How many different telephone numbers are there in the records?

Print a message:

"There are *count* different telephone numbers in the records."

#### Solution

In [None]:
sending_tel_num_txt_lst = []
receiving_tel_num_txt_lst = []

In [None]:
for item in texts:
    if item[0] not in sending_tel_num_txt_lst:
      sending_tel_num_txt_lst.append(item[0])
    if item[1] not in receiving_tel_num_txt_lst:
        receiving_tel_num_txt_lst.append(item[1])

In [None]:
len(sending_tel_num_txt_lst)

In [None]:
len(receiving_tel_num_txt_lst)

In [None]:
sending_tel_num_calls_lst = []
receiving_tel_num_calls_lst = []

In [None]:
for item in calls:
    if item[0] not in sending_tel_num_calls_lst:
      sending_tel_num_calls_lst.append(item[0])
    if item[1] not in receiving_tel_num_calls_lst:
        receiving_tel_num_calls_lst.append(item[1])

In [None]:
len(sending_tel_num_calls_lst)

In [None]:
len(receiving_tel_num_calls_lst)

In [None]:
unique_tel_num_lst = list(set(sending_tel_num_txt_lst + receiving_tel_num_txt_lst + sending_tel_num_calls_lst + receiving_tel_num_calls_lst))

In [None]:
len(unique_tel_num_lst)

In [None]:
print("There are {} different telephone numbers in the records.".format(len(unique_tel_num_lst)))

#### Runtime: O(n)

The code consists of two for-loops. A for-loop over all indexes has a time complexity - O(n). Therefore, this code roughly has O(n) time complexity.

### Task 2

**Problem**: 
    
Which telephone number spent the longest time on the phone
during the period? Don't forget that time spent answering a call is
also time spent on the phone.

Print a message:

"*Telephone number* spent the longest time, *total time* seconds, on the phone during 
September 2016.".

#### Solution

In [None]:
# Change datatype from string to integer for the duaration column in calls file
for item in calls:
    item[3] = int(item[3])

In [None]:
#calls

In [None]:
# Create a dictionary with unique phone numbers
unique_tel_num_dict = dict.fromkeys(unique_tel_num_lst, 0)

In [None]:
#unique_tel_num_dict

In [None]:
for item in calls:
    unique_tel_num_dict[item[0]] += item[3]
    unique_tel_num_dict[item[1]] += item[3]

In [None]:
#unique_tel_num_dict

In [None]:
max_key = max(unique_tel_num_dict, key=unique_tel_num_dict.get)

In [None]:
max_key

In [None]:
unique_tel_num_dict[max_key]

In [None]:
print("{} spent the longest time, {} seconds, on the phone during September 2016".format(max_key,
                                                                                         unique_tel_num_dict[max_key]))

#### Runtime: O(n)

The code consists of four for-loops. A for-loop over all indexes has a time complexity - O(n). Therefore, this code roughly has O(n) time complexity.

### Task 3

**Problem:**

(080) is the area code for fixed line telephones in Bangalore.
Fixed line numbers include parentheses, so Bangalore numbers
have the form (080)xxxxxxx.)

Part A: Find all of the area codes and mobile prefixes called by people
in Bangalore. In other words, the calls were initiated by "(080)" area code
to the following area codes and mobile prefixes:
 - Fixed lines start with an area code enclosed in brackets. The area
   codes vary in length but always begin with 0.
 - Mobile numbers have no parentheses, but have a space in the middle
   of the number to help readability. The prefix of a mobile number
   is its first four digits, and they always start with 7, 8 or 9.
 - Telemarketers' numbers have no parentheses or space, but they start
   with the area code 140.

Print the answer as part of a message:
"The numbers called by people in Bangalore have codes:"
 *list of codes*
The list of codes should be print out one per line in lexicographic order with no duplicates.

Part B: What percentage of calls from fixed lines in Bangalore are made
to fixed lines also in Bangalore? In other words, of all the calls made
from a number starting with "(080)", what percentage of these calls
were made to a number also starting with "(080)"?

Print the answer as a part of a message::
"*percentage* percent of calls from fixed lines in Bangalore are calls
to other fixed lines in Bangalore."
The percentage should have 2 decimal digits

#### Solution:

#### Part A

In [None]:
def return_area_code(tel_number):
    """
    This function returns are code for a telephone number.
    """
    
    # Fixed-line
    if tel_number.startswith('(0'):
        return tel_number.split(')')[0]+')'
    
    # Telemarketers
    if tel_number.startswith('140'):
        return '140'

    # Mobile phone numbers
    if (' ' in tel_number) and (tel_number.startswith('7') or tel_number.startswith('8') or tel_number.startswith('9')):
        return tel_number[:4]
    
    else:
        print('No rules to define the area code')
        return None

In [None]:
bang_called_area_codes = []

In [None]:
# Find area codes called by Bangalore telephone numbers
for i in range(len(calls)):
    if calls[i][0].startswith('(080)'):
        bang_called_area_codes.append(return_area_code(calls[i][1]))

In [None]:
len(bang_called_area_codes)

In [None]:
# Remove duplicates
bang_called_area_codes = list(set(bang_called_area_codes))

In [None]:
len(bang_called_area_codes)

In [None]:
# Sort list in in lexicographic order
bang_called_area_codes.sort()

In [None]:
print("The numbers called by people in Bangalore have codes:")

In [None]:
for code in bang_called_area_codes:
    print(code)

#### Runtime: O(n * logn)

This code consits of two for-loops and one sorting a list. The for-loops will have a time complexity - O(n), while sorting a list - O(n *logn). Therefore, the time complexity for Part A (Task #3) is: O(n*(1+1+1+1) + n*logn + n), which is roughly O(n*logn).

#### Part B

In [None]:
bang_to_all_tel_num_cnt = 0
bang_to_bang_tel_num_cnt = 0

In [None]:
for i in range(len(calls)):
    if calls[i][0].startswith('(080)'):
        bang_to_all_tel_num_cnt += 1
        if calls[i][1].startswith('(080)'):
            bang_to_bang_tel_num_cnt += 1

In [None]:
bang_to_all_tel_num_cnt

In [None]:
bang_to_bang_tel_num_cnt

In [None]:
# Calculate the percentage of calls made to Bamgalore from Bangalore
pcnt_to_bang = round((bang_to_bang_tel_num_cnt * 100) / bang_to_all_tel_num_cnt, 1)

In [None]:
pcnt_to_bang

In [None]:
print("{} percent of calls from fixed lines in Bangalore are calls to other fixed lines in Bangalore.".format(pcnt_to_bang))

#### Runtime: O(n)

We have one for-loop with a string operation and incrementing a variable, which has a time complexity O(n*(1+1)), which is roughly O(n).

### Task 4

The telephone company want to identify numbers that might be doing telephone marketing. 
Create a set of possible telemarketers:
* these are numbers that make outgoing calls but never send texts, receive texts or receive incoming calls.

Print a message:
"These numbers could be telemarketers: "
<list of numbers>
The list of numbers should be print out one per line in lexicographic order with no duplicates.
"""

List of telephone numbers, which make outgoing calls

In [2]:
tel_num_outgoing_calls = []
tel_num_sent_text = []
tel_num_received_text = []
tel_num_received_calls = []

In [3]:
for i in range(len(calls)):
    tel_num_outgoing_calls.append(calls[i][0])
    tel_num_received_calls.append(calls[i][1])

In [4]:
for i in range(len(texts)):
    tel_num_sent_text.append(texts[i][0])
    tel_num_received_text.append(texts[i][1])

In [5]:
# Remove duplicates
tel_num_outgoing_calls = list(set(tel_num_outgoing_calls))
tel_num_sent_text = list(set(tel_num_sent_text))
tel_num_received_text = list(set(tel_num_received_text))
tel_num_received_calls = list(set(tel_num_received_calls))

In [6]:
len(tel_num_outgoing_calls)

479

In [7]:
len(tel_num_sent_text)

237

In [8]:
len(tel_num_received_text)

230

In [9]:
len(tel_num_received_calls)

497

In [10]:
not_telemarketers = list(set(tel_num_sent_text + tel_num_received_text + tel_num_received_calls))

In [11]:
len(not_telemarketers)

527

In [12]:
potential_telemarketers = []

In [13]:
for i in range(len(tel_num_outgoing_calls)):
    if tel_num_outgoing_calls[i] not in not_telemarketers:
        potential_telemarketers.append(tel_num_outgoing_calls[i])

In [14]:
potential_telemarketers = list(set(potential_telemarketers))

In [15]:
potential_telemarketers.sort()

In [16]:
potential_telemarketers

['(022)37572285',
 '(022)65548497',
 '(022)68535788',
 '(022)69042431',
 '(040)30429041',
 '(044)22020822',
 '(0471)2171438',
 '(0471)6579079',
 '(080)20383942',
 '(080)25820765',
 '(080)31606520',
 '(080)40362016',
 '(080)60463379',
 '(080)60998034',
 '(080)62963633',
 '(080)64015211',
 '(080)69887826',
 '(0821)3257740',
 '1400481538',
 '1401747654',
 '1402316533',
 '1403072432',
 '1403579926',
 '1404073047',
 '1404368883',
 '1404787681',
 '1407539117',
 '1408371942',
 '1408409918',
 '1408672243',
 '1409421631',
 '1409668775',
 '1409994233',
 '74064 66270',
 '78291 94593',
 '87144 55014',
 '90351 90193',
 '92414 69419',
 '94495 03761',
 '97404 30456',
 '97407 84573',
 '97442 45192',
 '99617 25274']

In [17]:
print("These numbers could be telemarketers: ")

These numbers could be telemarketers: 


In [18]:
for number in potential_telemarketers:
    print(number)

(022)37572285
(022)65548497
(022)68535788
(022)69042431
(040)30429041
(044)22020822
(0471)2171438
(0471)6579079
(080)20383942
(080)25820765
(080)31606520
(080)40362016
(080)60463379
(080)60998034
(080)62963633
(080)64015211
(080)69887826
(0821)3257740
1400481538
1401747654
1402316533
1403072432
1403579926
1404073047
1404368883
1404787681
1407539117
1408371942
1408409918
1408672243
1409421631
1409668775
1409994233
74064 66270
78291 94593
87144 55014
90351 90193
92414 69419
94495 03761
97404 30456
97407 84573
97442 45192
99617 25274


#### Runtime: O(n*log n)

The code consists of four for-loops, which have time complexity O(n) each. Also, we have one sorting a list, which has a time complexity  - O(n*log n). Therefore, the time complexity for the code roughly is O(n + n + n + n*logn +n) = O(n*logn)

## Reference

Prado, K. S. do. (2020, February 15). Understanding time complexity with Python examples. Medium. https://towardsdatascience.com/understanding-time-complexity-with-python-examples-2bda6e8158a7. 