### Problem Descripition 

In 2012, URL shortening service Bitly partnered with the US government website USA.gov to provide a feed of anonymous data gathered from users who shorten links ending with .gov or .mil.

The text file comes in JSON format and here are some keys and their description. They are only the most important ones for this task.

|key| description |
|---|-----------|
| a|Denotes information about the web browser and operating system|
| tz | time zone |
| r | URL the user come from |
| u | URL where the user headed to |
| t | Timestamp when the user start using the website in UNIX format |
| hc | Timestamp when user exit the website in UNIX format |
| cy | City from which the request intiated |
| ll | Longitude and Latitude |

In the cell, I tried to provide some helper code for better understanding and clearer vision

-**HINT**- Those lines of code may be not helping at all with your task.

In [21]:
# I will try to retrieve one instance of the file in a list of dictionaries
#import json
#records = [json.loads(line) for line in open('usa.gov_click_data_1.json')]
# Print the first occurance
#records[0]

FileNotFoundError: [Errno 2] No such file or directory: 'usa.gov_click_data_1.json'

## Required

Write a script can transform the JSON files to a DataFrame and commit each file to a sparete CSV file in the target directory and consider the following:

        

All CSV files must have the following columns
- web_browser
        The web browser that has requested the service
- operating_sys
        operating system that intiated this request
- from_url

        The main URL the user came from

    **note**:

    If the retrived URL was in a long format `http://www.facebook.com/l/7AQEFzjSi/1.usa.gov/wfLQtf`

     make it appear in the file in a short format like this `www.facebook.com`
     
    
- to_url

       The same applied like `to_url`
   
- city

        The city from which the the request was sent
    
- longitude

        The longitude where the request was sent
- latitude

        The latitude where the request was sent

- time_zone
        
        The time zone that the city follow
        
- time_in

        Time when the request started
- time_out
        
        Time when the request is ended
        
        
**NOTE** :

Because that some instances of the file are incomplete, you may encouter some NaN values in your transforamtion. Make sure that the final dataframes have no NaNs at all.

# Let's Go..

### Step 1: Load the Data

In [22]:
# Import the required libraries:

import pandas as pd
import numpy as np
import os
import re
import json
import warnings
import argparse
import time
from urllib.parse import urlparse
from datetime import datetime as dt

In [105]:
print("Enter arguments:"+"\n"+"-u ifor unix timestamp format")

Enter arguments:
-i input file path
-o output file path


In [23]:
# Handling the Arguments
parser = argparse.ArgumentParser (description="Dina Hosny Python")
#parser.add_argument('-i','--inputPath',help="The path for the input file",type=str)
#parser.add_argument('-o','--outputPath',help="The path for the output file",type=str)
parser.add_argument('-u','--unix',help="if passed the time will be kept in UNIX format", action="store_true")
args, unknown = parser.parse_known_args()

In [24]:
# Record the Start Time:
start = time.time()

In [25]:
# Take the JSON file path from the user and save it into variable:
file_path = input('Enter a file path: ')
#file_path = args.inputPath

Enter a file path: data/usa.gov_click_data.json


In [29]:
# Print number of unique and duplicated records:
records = [json.loads(line) for line in open(file_path)]

Unique_records =[]
records_num = 1

for record in records:
    records_num +=1
    if record not in Unique_records:
        Unique_records.append(record)

print ("File contains "+ str(records_num)+ " record"+ "\n")        
if len(Unique_records)<len(records):
    print("There Are: " + str(len(records)-len(Unique_records)) + " duplicate values in the file")
else: print ("No Duplicate Values in the file")

File contains 13 record

No Duplicate Values in the file


In [30]:
# Load the data into a datafram:
# orient records and lines true cause records seperated by '\n'
first_df = pd.read_json(file_path, orient='records', lines=True)

In [12]:
# Check the dataframe:
first_df

Unnamed: 0,a,c,nk,tz,gr,g,h,l,al,hh,r,u,t,hc,cy,ll
0,Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...,US,1,America/New_York,MA,A6qOVH,wfLQtf,orofrog,"en-US,en;q=0.8",1.usa.gov,http://www.facebook.com/l/7AQEFzjSi/1.usa.gov/...,http://www.ncbi.nlm.nih.gov/pubmed/22415991,1333307030,1333307037,Danvers,"[42.576698, -70.954903]"
1,GoogleMaps/RochesterNY,US,0,America/Denver,UT,mwszkS,mwszkS,bitly,,j.mp,http://www.AwareMap.com/,http://www.monroecounty.gov/etc/911/rss.php,1331923249,1308262393,Provo,"[40.218102, -111.613297]"
2,Mozilla/4.0 (Windows NT 6.1; MSIE 8.0; Windows...,US,1,America/New_York,DC,xxr3Qb,xxr3Qb,bitly,en-US,1.usa.gov,http://t.co/03elZC4Q,http://boxer.senate.gov/en/press/releases/0316...,1333407030,1333407035,Washington,"[38.9007, -77.043098]"
3,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8)...,BR,0,America/Sao_Paulo,27,zCaLwp,zUtuOu,alelex88,pt-br,1.usa.gov,direct,http://apod.nasa.gov/apod/ap120312.html,1333507030,1333507044,Braz,"[-23.549999, -46.616699]"
4,Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...,US,0,America/New_York,MA,9b6kNl,9b6kNl,bitly,"en-US,en;q=0.8",bit.ly,http://www.shrewsbury-ma.gov/selco/,http://www.shrewsbury-ma.gov/egov/gallery/1341...,1333607030,1333607039,Shrewsbury,"[42.286499, -71.714699]"
5,Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...,US,0,America/New_York,MA,axNK8c,axNK8c,bitly,"en-US,en;q=0.8",bit.ly,http://www.shrewsbury-ma.gov/selco/,http://www.shrewsbury-ma.gov/egov/gallery/1341...,1333707030,1333707048,Shrewsbury,"[42.286499, -71.714699]"
6,Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1...,PL,0,Europe/Warsaw,77,wcndER,zkpJBR,bnjacobs,"pl-PL,pl;q=0.8,en-US;q=0.6,en;q=0.4",1.usa.gov,http://plus.url.google.com/url?sa=z&n=13319232...,http://www.nasa.gov/mission_pages/nustar/main/...,1333807030,1333807040,Luban,"[51.116699, 15.2833]"
7,Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/2...,,0,,,wcndER,zkpJBR,bnjacobs,"bg,en-us;q=0.7,en;q=0.3",1.usa.gov,http://www.facebook.com/,http://www.nasa.gov/mission_pages/nustar/main/...,1334007036,13340070364,,
8,Opera/9.80 (Ubuntu 14.04.6; Linux zbov; U; en)...,,0,,,wcndER,zkpJBR,bnjacobs,"en-US, en",1.usa.gov,http://www.facebook.com/l.php?u=http%3A%2F%2F1...,http://www.nasa.gov/mission_pages/nustar/main/...,1333907030,1333907042,,
9,Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...,,0,,,zCaLwp,zUtuOu,alelex88,"pt-BR,pt;q=0.8,en-US;q=0.6,en;q=0.4",1.usa.gov,http://t.co/o1Pd0WeV,http://apod.nasa.gov/apod/ap120312.html,1334007030,1334007040,,


### Step 2: Exploring Data:

In [31]:
# Explore 'a' column that contains the browser and operating system type: 
first_df['a']

0     Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
1                                GoogleMaps/RochesterNY
2     Mozilla/4.0 (Windows NT 6.1; MSIE 8.0; Windows...
3     Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8)...
4     Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
5     Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
6     Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1...
7     Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/2...
8     Opera/9.80 (Ubuntu 14.04.6; Linux zbov; U; en)...
9     Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
10    Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2)...
11    Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4...
Name: a, dtype: object

In [32]:
# Explore 'r' column that contains the from url:
first_df['r']

0     http://www.facebook.com/l/7AQEFzjSi/1.usa.gov/...
1                              http://www.AwareMap.com/
2                                  http://t.co/03elZC4Q
3                                                direct
4                   http://www.shrewsbury-ma.gov/selco/
5                   http://www.shrewsbury-ma.gov/selco/
6     http://plus.url.google.com/url?sa=z&n=13319232...
7                              http://www.facebook.com/
8     http://www.facebook.com/l.php?u=http%3A%2F%2F1...
9                                  http://t.co/o1Pd0WeV
10                                               direct
11                                 http://t.co/ND7SoPyo
Name: r, dtype: object

In [33]:
# Explore 'u' column that contains the to url:
first_df['u']

0           http://www.ncbi.nlm.nih.gov/pubmed/22415991
1           http://www.monroecounty.gov/etc/911/rss.php
2     http://boxer.senate.gov/en/press/releases/0316...
3               http://apod.nasa.gov/apod/ap120312.html
4     http://www.shrewsbury-ma.gov/egov/gallery/1341...
5     http://www.shrewsbury-ma.gov/egov/gallery/1341...
6     http://www.nasa.gov/mission_pages/nustar/main/...
7     http://www.nasa.gov/mission_pages/nustar/main/...
8     http://www.nasa.gov/mission_pages/nustar/main/...
9               http://apod.nasa.gov/apod/ap120312.html
10    https://www.nysdot.gov/rexdesign/design/commun...
11    http://oversight.house.gov/wp-content/uploads/...
Name: u, dtype: object

In [34]:
# Explore 'cy' column that contains the city info:
first_df['cy']

0        Danvers
1          Provo
2     Washington
3           Braz
4     Shrewsbury
5     Shrewsbury
6          Luban
7            NaN
8            NaN
9            NaN
10       Seattle
11    Washington
Name: cy, dtype: object

In [35]:
# Explore 'll' column that contains the longitude and latitude info in a list:
first_df['ll']

0      [42.576698, -70.954903]
1     [40.218102, -111.613297]
2        [38.9007, -77.043098]
3     [-23.549999, -46.616699]
4      [42.286499, -71.714699]
5      [42.286499, -71.714699]
6         [51.116699, 15.2833]
7                          NaN
8                          NaN
9                          NaN
10      [47.5951, -122.332603]
11     [38.937599, -77.092796]
Name: ll, dtype: object

In [36]:
# Explore 'tz' column that contains the time zone info:
first_df['tz']

0        America/New_York
1          America/Denver
2        America/New_York
3       America/Sao_Paulo
4        America/New_York
5        America/New_York
6           Europe/Warsaw
7                        
8                        
9                        
10    America/Los_Angeles
11       America/New_York
Name: tz, dtype: object

In [37]:
# Explore 't' column that contains the timestamp in info:
first_df['t']

0     1333307030
1     1331923249
2     1333407030
3     1333507030
4     1333607030
5     1333707030
6     1333807030
7     1334007036
8     1333907030
9     1334007030
10    1334107030
11    1334207040
Name: t, dtype: int64

In [38]:
# Explore 'hc' column that contains the timestamp out info:
first_df['hc']

0      1333307037
1      1308262393
2      1333407035
3      1333507044
4      1333607039
5      1333707048
6      1333807040
7     13340070364
8      1333907042
9      1334007040
10     1334107040
11     1334207043
Name: hc, dtype: int64

### Step 3: Working with data

In [51]:
# Create the output dataframe that contains the required data:
bitly_data = pd.DataFrame()

In [52]:
# Check the creation:
bitly_data 

#### 1- Extract the Web Browser type:

In [53]:
# Create web browser column in the output dataframe that contains the web browser type
# Extract the web browser type from the column 'a'
# Web browser type is the first word in the column and ends in '/' 
bitly_data['web_browser'] = first_df['a'].str.split('/').str[0]

In [54]:
# Check the data:
bitly_data['web_browser']

0        Mozilla
1     GoogleMaps
2        Mozilla
3        Mozilla
4        Mozilla
5        Mozilla
6        Mozilla
7        Mozilla
8          Opera
9        Mozilla
10       Mozilla
11       Mozilla
Name: web_browser, dtype: object

In [55]:
# Mozilla and Opera are web browsers while GoogleMaps not! 
# Handling External programs that are not web browsers
bitly_data['web_browser'] = bitly_data['web_browser'].apply(lambda x: x if x in ['Mozilla', 'Opera'] else 'External Program' )

In [56]:
# Check the data:
bitly_data['web_browser']

0              Mozilla
1     External Program
2              Mozilla
3              Mozilla
4              Mozilla
5              Mozilla
6              Mozilla
7              Mozilla
8                Opera
9              Mozilla
10             Mozilla
11             Mozilla
Name: web_browser, dtype: object

#### 2- Extract the Operating System type:

In [57]:
# Create operating_sys column in the output dataframe that contains the operating system type
# Extract the operating system type from the column 'a'
# Operating system type is founded between '()' 
bitly_data['operating_sys'] = first_df['a'].str.split('(').str[1].str.split(')').str[0]

In [58]:
# Check the data:
bitly_data['operating_sys']

# Extracted operating system type contains unuseful data, so it'll be ommited  

0                                 Windows NT 6.1; WOW64
1                                                   NaN
2     Windows NT 6.1; MSIE 8.0; Windows NT 6.1; WOW6...
3                      Macintosh; Intel Mac OS X 10_6_8
4                                 Windows NT 6.1; WOW64
5                                 Windows NT 6.1; WOW64
6                                        Windows NT 5.1
7                              Windows NT 6.1; rv:2.0.1
8                     Ubuntu 14.04.6; Linux zbov; U; en
9                                 Windows NT 6.1; WOW64
10                     Windows NT 6.1; WOW64; rv:10.0.2
11    Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1...
Name: operating_sys, dtype: object

In [59]:
# Ommit the unuseful data from the operating system column
# The useful data which is the operarting system name and version is the first part and ends with ';'
bitly_data['operating_sys'] = bitly_data['operating_sys'].str.split(';').str[0]

In [60]:
# Check the data:
bitly_data['operating_sys']

# Looks better :)

0     Windows NT 6.1
1                NaN
2     Windows NT 6.1
3          Macintosh
4     Windows NT 6.1
5     Windows NT 6.1
6     Windows NT 5.1
7     Windows NT 6.1
8     Ubuntu 14.04.6
9     Windows NT 6.1
10    Windows NT 6.1
11         Macintosh
Name: operating_sys, dtype: object

In [61]:
# Fill the NaN values with 'unknown'
bitly_data['operating_sys'] = bitly_data['operating_sys'].fillna('Unknown')

In [62]:
# Chwck the final result:
bitly_data['operating_sys']

0     Windows NT 6.1
1            Unknown
2     Windows NT 6.1
3          Macintosh
4     Windows NT 6.1
5     Windows NT 6.1
6     Windows NT 5.1
7     Windows NT 6.1
8     Ubuntu 14.04.6
9     Windows NT 6.1
10    Windows NT 6.1
11         Macintosh
Name: operating_sys, dtype: object

#### 3- Extract From URL:

In [63]:
# Create from_url column in the output dataframe that contains the URL that user came from
# Extract the from URL from the column 'r'
# From URL is too long, so it should be in the short format that is before third '/'

bitly_data['from_url'] = first_df['r'].str.split('/').str[2]

In [64]:
# Check the data:
bitly_data['from_url']

0          www.facebook.com
1          www.AwareMap.com
2                      t.co
3                       NaN
4     www.shrewsbury-ma.gov
5     www.shrewsbury-ma.gov
6       plus.url.google.com
7          www.facebook.com
8          www.facebook.com
9                      t.co
10                      NaN
11                     t.co
Name: from_url, dtype: object

In [65]:
# Fill the NaN values with 'Direct'
bitly_data['from_url'] = bitly_data['from_url'].fillna('Direct')

In [66]:
# Check the final result:
bitly_data['from_url']

0          www.facebook.com
1          www.AwareMap.com
2                      t.co
3                    Direct
4     www.shrewsbury-ma.gov
5     www.shrewsbury-ma.gov
6       plus.url.google.com
7          www.facebook.com
8          www.facebook.com
9                      t.co
10                   Direct
11                     t.co
Name: from_url, dtype: object

#### 4- Extract To URL:

In [67]:
# Create to_url column in the output dataframe that contains the URL that user went to
# Extract the to URL from the column 'u'
# From URL is too long, so it should be in the short format that is before third '/'

bitly_data['to_url'] = first_df['u'].str.split('/').str[2]

In [68]:
# Check the data:
bitly_data['to_url']

0      www.ncbi.nlm.nih.gov
1      www.monroecounty.gov
2          boxer.senate.gov
3             apod.nasa.gov
4     www.shrewsbury-ma.gov
5     www.shrewsbury-ma.gov
6              www.nasa.gov
7              www.nasa.gov
8              www.nasa.gov
9             apod.nasa.gov
10           www.nysdot.gov
11      oversight.house.gov
Name: to_url, dtype: object

#### 5- Extract City

In [69]:
# Create city column in the output dataframe that contains the city that request was sent from
# Extract the city from the column 'cy'

bitly_data['city'] = first_df['cy']

In [70]:
# Check the data:
bitly_data['city']

0        Danvers
1          Provo
2     Washington
3           Braz
4     Shrewsbury
5     Shrewsbury
6          Luban
7            NaN
8            NaN
9            NaN
10       Seattle
11    Washington
Name: city, dtype: object

In [71]:
# Fill the Nan values with 'Unknown':
bitly_data['city'] = bitly_data['city'].fillna('Unknown')

In [72]:
# Chwvk the final result:
bitly_data['city']

0        Danvers
1          Provo
2     Washington
3           Braz
4     Shrewsbury
5     Shrewsbury
6          Luban
7        Unknown
8        Unknown
9        Unknown
10       Seattle
11    Washington
Name: city, dtype: object

#### 6- Extract Longitude:

In [73]:
# Create longitude column in the output dataframe that contains longitude info
# Extract the longitude from the column 'll'
# The column contains list, the first item is the longitude

bitly_data['longitude'] = first_df['ll'].str[0]

In [74]:
# Check the data:
bitly_data['longitude']

0     42.576698
1     40.218102
2     38.900700
3    -23.549999
4     42.286499
5     42.286499
6     51.116699
7           NaN
8           NaN
9           NaN
10    47.595100
11    38.937599
Name: longitude, dtype: float64

In [75]:
# Fill the Null data with 'Not Detected'
bitly_data['longitude'] = bitly_data['longitude'].fillna('Not Detected')

In [76]:
# Check the result:
bitly_data['longitude']

0        42.576698
1        40.218102
2          38.9007
3       -23.549999
4        42.286499
5        42.286499
6        51.116699
7     Not Detected
8     Not Detected
9     Not Detected
10         47.5951
11       38.937599
Name: longitude, dtype: object

#### 7- Extract the latitude:

In [77]:
# Create latitude column in the output dataframe that contains latitude info
# Extract the latitude from the column 'll'
# The column contains list, the second item is the longitude

bitly_data['latitude'] = first_df['ll'].str[1]

In [78]:
# Check the data:
bitly_data['latitude']

0     -70.954903
1    -111.613297
2     -77.043098
3     -46.616699
4     -71.714699
5     -71.714699
6      15.283300
7            NaN
8            NaN
9            NaN
10   -122.332603
11    -77.092796
Name: latitude, dtype: float64

In [79]:
# Fill the Null data with 'Not Detected'
bitly_data['latitude'] = bitly_data['latitude'].fillna('Not Detected')

In [80]:
# Check the data:
bitly_data['latitude']

0       -70.954903
1      -111.613297
2       -77.043098
3       -46.616699
4       -71.714699
5       -71.714699
6          15.2833
7     Not Detected
8     Not Detected
9     Not Detected
10     -122.332603
11      -77.092796
Name: latitude, dtype: object

#### 8- Extract the Time Zone:

In [81]:
# Create time_zone column in the output dataframe that contains the time zone that city follows
# Extract the the time zone from the column 'tz'

bitly_data['time_zone'] = first_df['tz']

In [82]:
# Check the data:
bitly_data['time_zone']

0        America/New_York
1          America/Denver
2        America/New_York
3       America/Sao_Paulo
4        America/New_York
5        America/New_York
6           Europe/Warsaw
7                        
8                        
9                        
10    America/Los_Angeles
11       America/New_York
Name: time_zone, dtype: object

In [106]:
# There're null values but stored as blank space:
# Replace the empty values with 'Unknown'
bitly_data['time_zone'] = bitly_data['time_zone'].replace('', 'Unknown')

#### 9- Extract Time In:

In [83]:
# Create time_in column in the output dataframe that contains when request started
# Extract the time in from the column 't'

bitly_data['time_in'] = first_df['t']

In [98]:
# Check the data:
bitly_data['time_in']

0     1333307030
1     1331923249
2     1333407030
3     1333507030
4     1333607030
5     1333707030
6     1333807030
7     1334007036
8     1333907030
9     1334007030
10    1334107030
11    1334207040
Name: time_in, dtype: int64

In [84]:
# Keep unix timestamp if argument -u passed:
if args.unix:
    bitly_data['time_in'] = first_df['t']

# If not passed, Convert the data into timestamp form using fromtimestamp():
else:
    bitly_data['time_in'] = bitly_data['time_in'].apply(lambda x : dt.fromtimestamp(x))

In [85]:
# Check the final result:
bitly_data['time_in']

0    2012-04-01 21:03:50
1    2012-03-16 20:40:49
2    2012-04-03 00:50:30
3    2012-04-04 04:37:10
4    2012-04-05 08:23:50
5    2012-04-06 12:10:30
6    2012-04-07 15:57:10
7    2012-04-09 23:30:36
8    2012-04-08 19:43:50
9    2012-04-09 23:30:30
10   2012-04-11 03:17:10
11   2012-04-12 07:04:00
Name: time_in, dtype: datetime64[ns]

#### 10- Extract Time Out

In [86]:
# Create time_out column in the output dataframe that contains when request ended
# Extract the time in from the column 'hc'

bitly_data['time_out'] = first_df['hc']

In [87]:
# Check the data:
bitly_data['time_out']

0      1333307037
1      1308262393
2      1333407035
3      1333507044
4      1333607039
5      1333707048
6      1333807040
7     13340070364
8      1333907042
9      1334007040
10     1334107040
11     1334207043
Name: time_out, dtype: int64

In [88]:
# Keep unix timestamp if argument -u passed:
if args.unix:
    bitly_data['time_out'] = first_df['hc']

# If not passed, Convert the data into timestamp form using fromtimestamp():
else:
    bitly_data['time_out'] = bitly_data['time_out'].apply(lambda x : dt.fromtimestamp(x))

In [89]:
# Check the final result:
bitly_data['time_out']

0     2012-04-01 21:03:57
1     2011-06-17 00:13:13
2     2012-04-03 00:50:35
3     2012-04-04 04:37:24
4     2012-04-05 08:23:59
5     2012-04-06 12:10:48
6     2012-04-07 15:57:20
7     2392-09-24 01:06:04
8     2012-04-08 19:44:02
9     2012-04-09 23:30:40
10    2012-04-11 03:17:20
11    2012-04-12 07:04:03
Name: time_out, dtype: object

### Step 4: Export Final Data to CSV file:

In [90]:
# Check the final dataframe:
bitly_data

Unnamed: 0,web_browser,operating_sys,from_url,to_url,city,longitude,latitude,time_zone,time_in,time_out
0,Mozilla,Windows NT 6.1,www.facebook.com,www.ncbi.nlm.nih.gov,Danvers,42.576698,-70.954903,America/New_York,2012-04-01 21:03:50,2012-04-01 21:03:57
1,External Program,Unknown,www.AwareMap.com,www.monroecounty.gov,Provo,40.218102,-111.613297,America/Denver,2012-03-16 20:40:49,2011-06-17 00:13:13
2,Mozilla,Windows NT 6.1,t.co,boxer.senate.gov,Washington,38.9007,-77.043098,America/New_York,2012-04-03 00:50:30,2012-04-03 00:50:35
3,Mozilla,Macintosh,Direct,apod.nasa.gov,Braz,-23.549999,-46.616699,America/Sao_Paulo,2012-04-04 04:37:10,2012-04-04 04:37:24
4,Mozilla,Windows NT 6.1,www.shrewsbury-ma.gov,www.shrewsbury-ma.gov,Shrewsbury,42.286499,-71.714699,America/New_York,2012-04-05 08:23:50,2012-04-05 08:23:59
5,Mozilla,Windows NT 6.1,www.shrewsbury-ma.gov,www.shrewsbury-ma.gov,Shrewsbury,42.286499,-71.714699,America/New_York,2012-04-06 12:10:30,2012-04-06 12:10:48
6,Mozilla,Windows NT 5.1,plus.url.google.com,www.nasa.gov,Luban,51.116699,15.2833,Europe/Warsaw,2012-04-07 15:57:10,2012-04-07 15:57:20
7,Mozilla,Windows NT 6.1,www.facebook.com,www.nasa.gov,Unknown,Not Detected,Not Detected,,2012-04-09 23:30:36,2392-09-24 01:06:04
8,Opera,Ubuntu 14.04.6,www.facebook.com,www.nasa.gov,Unknown,Not Detected,Not Detected,,2012-04-08 19:43:50,2012-04-08 19:44:02
9,Mozilla,Windows NT 6.1,t.co,apod.nasa.gov,Unknown,Not Detected,Not Detected,,2012-04-09 23:30:30,2012-04-09 23:30:40


In [91]:
# Check the final dataframe shape:
bitly_data.shape

(12, 10)

In [100]:
# Take the final CSV file path from the user and save it into variable:
output_name= input('Enter the output file name: ')
output_path = input('Enter the path to save CSV in: ')

Enter the output file name: out
Enter the path to save CSV in: data


In [101]:
# Load the data into csv file:
bitly_data.to_csv(output_path+"\\" + output_name+ '.csv')
###bitly_data.to_csv(args.outputPath+'\\'+ output_name+ '.csv')

In [None]:
# Print the number of rows transformed:
print ("The File in the path: " , file_path , " was succefully transformed")

print ("There was " + str(len(bitly_data.index)) + " rows transformed" )

In [102]:
# Record the end time:
end = time.time()

In [103]:
# Print the execution time:
# difference between start and end time in milli. secs
print("The time of execution of this script is :",
      (end-start) * 10**3, "ms")

The time of execution of this script is : 1060882.7199935913 ms


In [None]:
# End .. :)