# Places Analysis
In this notebook we'll do some analysis of the Google "PlaceVisit" data using pandas. We'll plot our results using Matplotlib.

In [3]:
%matplotlib inline
import pandas as pd

In [4]:
# Set this to the name of your places file
PLACES_PATH = 'places.csv'
# TODO: allow setting start and end date
places = pd.read_csv(PLACES_PATH, sep='|', encoding='utf-8', parse_dates=['start_timestamp', 'end_timestamp'])
print(f'Data has {places.shape[0]} rows and {places.shape[1]} columns')
places.info()

Data has 2586 rows and 8 columns
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2586 entries, 0 to 2585
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype              
---  ------           --------------  -----              
 0   lat_e7           2586 non-null   int64              
 1   lon_e7           2586 non-null   int64              
 2   address          2582 non-null   object             
 3   name             1469 non-null   object             
 4   place_id         2586 non-null   object             
 5   start_timestamp  2586 non-null   datetime64[ns, UTC]
 6   end_timestamp    2586 non-null   datetime64[ns, UTC]
 7   confidence       2586 non-null   object             
dtypes: datetime64[ns, UTC](2), int64(2), object(4)
memory usage: 161.8+ KB


In [5]:
print('Top ten places, by number of records')
places['name'].value_counts()[:10]

Top ten places, by number of records


Tennis club Augsburg e.V.      135
rutaNatur                       56
GALERIA Augsburg                54
GALERIA (Karstadt) Augsburg     49
Munich Central Station          46
Augsburg                        41
Augsburg Bohus Center           28
QPLIX GmbH                      26
REWE                            26
Königsplatz                     25
Name: name, dtype: int64

In [9]:
# Calculate time spent per place.
# Note: this may provide seemingly strange results. 
# Personal addresses (where you live) likely don't have a "name", and
# therefore won't show up in the results. To see them, group by 'address'
print('Top ten places, by duration:')
places['duration'] = places['end_timestamp'] - places['start_timestamp']
time_spent = places.groupby('name')['duration'].sum()
time_spent.sort_values(inplace=True, ascending=False)
time_spent[:10]

Top ten places, by duration:


address
Beethovenstraße 3, 86150 Augsburg, Deutschland                 383 days 17:02:22.616000
Oblatterwallstraße 58, 86153 Augsburg, Deutschland              67 days 15:29:45.961000
5 Ledgelawn Ave, Lexington, MA 02420, USA                       21 days 21:57:42.557000
4 Rue Andrioli, 06000 Nice, France                              17 days 18:10:13.066000
Landwehrstraße 25, 97070 Würzburg, Deutschland                  12 days 05:51:46.251000
Alteneschstraße 15, 26135 Oldenburg, Deutschland                10 days 08:31:30.548000
Professor-Steinbacher-Straße 6A, 86161 Augsburg, Deutschland     8 days 23:42:13.158000
Nußbaumstraße 12, 80336 München, Deutschland                     4 days 16:18:07.514000
Bauernfeindstraße 26, 86159 Augsburg, Deutschland                3 days 08:01:41.500000
Oberes Feld 4, 6071 Aldrans, Österreich                          3 days 01:13:58.843000
Name: duration, dtype: timedelta64[ns]

In [None]:
# TODO: countries