# Introduction
The aim of this document is to look at theatre in Warsaw and answer questions:
1. Is there any relation between rating of the theatre and number of venues around?
2. Is there any relation between popularity of the theatre and number of venues around?
This issues can be important when you want to establish your own theatre. 

# Data
Data for this paper was downloaded from wikipedia. There is an article in polish about theatres in Warsaw and this article has table of places with adresses. Basic on this table using googlemaps specific latitude and longitude was collected. After that each location was scoped using foursquare in 500m around to find places in each location. 

# Methodology 
Collected data was cleaned, scoped, summarised in many ways to zoom the structure. It was also shown on the folium map where each theatre is located. After that linear regression models were performed to find answer to the questions. 

# Results 
I found no relation between rating and number of venues around. I found also no relation between popularity of ratings and venues around. It seems that popularity and rating of the teatre depends on diffrent factors (like actors, plays, director, etc.) which were not included in this research. 

# Conclusions
Before final conclusion there i think it's worth to look for type of venues and rating and popularity (in fact this was the main purpose of this analysis). Unfortunately, foursquare classifies too many types of venues (around 200) and it can't be used to include 200 predictors in model with 50 observations. 

In [None]:
import pandas as pd
import numpy as np
import json
from bs4 import BeautifulSoup
from matplotlib import pyplot as plt
from matplotlib import colors as colors
import sklearn.cluster
import folium
import requests
from IPython.display import clear_output
import googlemaps

In [165]:
url_theatres = "https://pl.wikipedia.org/wiki/Teatry_w_Warszawie"
theatres = requests.get(url_theatres).text

soup = BeautifulSoup(theatres)
table = soup.find_all('table')[0]
t_df = pd.read_html(str(table))[0]
t_df.drop(columns = ['Unnamed: 5'], inplace=True)
t_df

Unnamed: 0,teatr,rodzaj,prowadzący,siedziba,www
0,Mazowiecki Teatr Muzyczny Operetka,teatr operetkowy,Włodzimierz Izban,"Bielany, ul. Kolumbijska 3",www.mteatr.pl
1,Nowy Teatr,teatr dramatyczny,Krzysztof Warlikowski,"Mokotów, ul. Madalińskiego 10/16",www.nowyteatr.org
2,Och-Teatr,teatr dramatyczny,Katarzyna Błachiewicz (dyrektor teatrów Fundac...,"Ochota, ul. Grójecka 65",www.ochteatr.com.pl
3,Stara ProchOFFnia,,Wojciech Feliksiak,"Śródmieście, ul. Boleść 2",www.staraprochoffnia.scek.pl
4,Studio Buffo,teatr muzyczny,"Janusz Stokłosa (prezes), Janusz Józefowicz (d...","Śródmieście, ul. Konopnickiej 6",www.studiobuffo.com.pl
5,Studium Teatralne,,,"Praga-Południe, ul. Lubelska 30/32",www.studiumteatralne.pl
6,Teatr 6. piętro,teatr dramatyczny,Michał Żebrowski i Eugeniusz Korin,"Śródmieście, Pałac Kultury i Nauki – wejście o...",www.teatr6pietro.pl
7,Teatr Ateneum,teatr dramatyczny,Ryszard Markow,"Śródmieście, ul. Jaracza 2",www.teatrateneum.pl
8,Teatr Baj,teatr lalek,Ewa Piotrowska,"Praga-Północ, ul. Jagiellońska 28",www.teatrbaj.waw.pl
9,Teatr Bajka,teatr komediowy,Natalia Pietkiewicz-Szwarc,"Śródmieście, ul. Marszałkowska 138",www.teatrbajka.pl


In [166]:
t_df['rodzaj'].value_counts().to_frame(name="Count")

Unnamed: 0,Count
teatr dramatyczny,21
teatr komediowy,6
teatr muzyczny,3
teatr lalek,2
teatr operowy,2
teatr offowy,1
teatr operetkowy,1
teatr familijny,1
teatr niezależny,1
teatr dziecięcy,1


Unnamed: 0,teatr,rodzaj,prowadzący,siedziba,www,lat,lon
0,Mazowiecki Teatr Muzyczny Operetka,teatr operetkowy,Włodzimierz Izban,"Bielany, ul. Kolumbijska 3",www.mteatr.pl,52.302187,20.933281
1,Nowy Teatr,teatr dramatyczny,Krzysztof Warlikowski,"Mokotów, ul. Madalińskiego 10/16",www.nowyteatr.org,52.205248,21.01943
2,Och-Teatr,teatr dramatyczny,Katarzyna Błachiewicz (dyrektor teatrów Fundac...,"Ochota, ul. Grójecka 65",www.ochteatr.com.pl,52.214206,20.980168
3,Stara ProchOFFnia,,Wojciech Feliksiak,"Śródmieście, ul. Boleść 2",www.staraprochoffnia.scek.pl,52.252101,21.011984
4,Studio Buffo,teatr muzyczny,"Janusz Stokłosa (prezes), Janusz Józefowicz (d...","Śródmieście, ul. Konopnickiej 6",www.studiobuffo.com.pl,52.228337,21.026031
5,Studium Teatralne,,,"Praga-Południe, ul. Lubelska 30/32",www.studiumteatralne.pl,52.250295,21.051736
6,Teatr 6. piętro,teatr dramatyczny,Michał Żebrowski i Eugeniusz Korin,"Śródmieście, Pałac Kultury i Nauki – wejście o...",www.teatr6pietro.pl,52.231838,21.005995
7,Teatr Ateneum,teatr dramatyczny,Ryszard Markow,"Śródmieście, ul. Jaracza 2",www.teatrateneum.pl,52.237304,21.032846
8,Teatr Baj,teatr lalek,Ewa Piotrowska,"Praga-Północ, ul. Jagiellońska 28",www.teatrbaj.waw.pl,52.252403,21.034744
9,Teatr Bajka,teatr komediowy,Natalia Pietkiewicz-Szwarc,"Śródmieście, ul. Marszałkowska 138",www.teatrbajka.pl,52.236071,21.008587


In [206]:
def find_latlon2(address):
    try:
        search_location = googlemaps.Client(key = "AIzaSyAmrWggSZIp0VABpc2gMp9fD2JQN5WE7Yk").places(address)
        lat = search_location['results'][0]['geometry']['location']['lat']
        lon = search_location['results'][0]['geometry']['location']['lng']
        rating = search_location['results'][0]['rating']
        user_ratings_total = search_location['results'][0]['user_ratings_total']
    except:
        search_location = googlemaps.Client(key = "AIzaSyAmrWggSZIp0VABpc2gMp9fD2JQN5WE7Yk").geocode(address)
        lat = search_location[0]['geometry']['location']['lat']
        lon = search_location[0]['geometry']['location']['lng']
        rating = "NaN"
        user_ratings_total = "NaN"
    return({'lat': round(lat,7), 'lon': round(lon,7), 'rating': rating, 'user_ratings_total': user_ratings_total})



lst = []
for teatr, address in zip(t_df['teatr'], t_df['siedziba']):
    lst.append(find_latlon2(address + " " +  teatr))
lst

[{'lat': 52.2767531,
  'lon': 20.9480547,
  'rating': 4.5,
  'user_ratings_total': 65},
 {'lat': 52.2052251,
  'lon': 21.0199031,
  'rating': 4.7,
  'user_ratings_total': 912},
 {'lat': 52.214206,
  'lon': 20.980168,
  'rating': 4.7,
  'user_ratings_total': 2299},
 {'lat': 52.2520335,
  'lon': 21.0118501,
  'rating': 4.5,
  'user_ratings_total': 105},
 {'lat': 52.2281469,
  'lon': 21.026145,
  'rating': 4.2,
  'user_ratings_total': 114},
 {'lat': 52.2504595, 'lon': 21.0509099, 'rating': 4, 'user_ratings_total': 7},
 {'lat': 52.2317616,
  'lon': 21.0061014,
  'rating': 4.7,
  'user_ratings_total': 1323},
 {'lat': 52.2373039,
  'lon': 21.0328462,
  'rating': 4.7,
  'user_ratings_total': 916},
 {'lat': 52.2523563,
  'lon': 21.034443,
  'rating': 4.7,
  'user_ratings_total': 79},
 {'lat': 52.2358644,
  'lon': 21.0086243,
  'rating': 4.8,
  'user_ratings_total': 2114},
 {'lat': 52.2412061,
  'lon': 21.0030915,
  'rating': 4.6,
  'user_ratings_total': 2563},
 {'lat': 52.2486241,
  'lon': 21.

In [168]:
pd.DataFrame(lst)

Unnamed: 0,lat,lon,rating,user_ratings_total
0,52.276753,20.948055,4.5,65.0
1,52.205225,21.019903,4.7,912.0
2,52.214206,20.980168,4.7,2299.0
3,52.252034,21.01185,4.5,105.0
4,52.228147,21.026145,4.2,114.0
5,52.250459,21.05091,4.0,7.0
6,52.231762,21.006101,4.7,1323.0
7,52.237304,21.032846,4.7,916.0
8,52.252356,21.034443,4.7,79.0
9,52.235864,21.008624,4.8,2114.0


In [207]:
t_full = pd.concat([t_df, pd.DataFrame(lst)], axis=1)

In [270]:
t_full = t_full[t_full['rating']!="NaN"]

Unnamed: 0,teatr,rodzaj,prowadzący,siedziba,www,lat,lon,rating,user_ratings_total
0,Mazowiecki Teatr Muzyczny Operetka,teatr operetkowy,Włodzimierz Izban,"Bielany, ul. Kolumbijska 3",www.mteatr.pl,52.276753,20.948055,4.5,65
1,Nowy Teatr,teatr dramatyczny,Krzysztof Warlikowski,"Mokotów, ul. Madalińskiego 10/16",www.nowyteatr.org,52.205225,21.019903,4.7,912
2,Och-Teatr,teatr dramatyczny,Katarzyna Błachiewicz (dyrektor teatrów Fundac...,"Ochota, ul. Grójecka 65",www.ochteatr.com.pl,52.214206,20.980168,4.7,2299
3,Stara ProchOFFnia,,Wojciech Feliksiak,"Śródmieście, ul. Boleść 2",www.staraprochoffnia.scek.pl,52.252034,21.01185,4.5,105
4,Studio Buffo,teatr muzyczny,"Janusz Stokłosa (prezes), Janusz Józefowicz (d...","Śródmieście, ul. Konopnickiej 6",www.studiobuffo.com.pl,52.228147,21.026145,4.2,114
5,Studium Teatralne,,,"Praga-Południe, ul. Lubelska 30/32",www.studiumteatralne.pl,52.250459,21.05091,4.0,7
6,Teatr 6. piętro,teatr dramatyczny,Michał Żebrowski i Eugeniusz Korin,"Śródmieście, Pałac Kultury i Nauki – wejście o...",www.teatr6pietro.pl,52.231762,21.006101,4.7,1323
7,Teatr Ateneum,teatr dramatyczny,Ryszard Markow,"Śródmieście, ul. Jaracza 2",www.teatrateneum.pl,52.237304,21.032846,4.7,916
8,Teatr Baj,teatr lalek,Ewa Piotrowska,"Praga-Północ, ul. Jagiellońska 28",www.teatrbaj.waw.pl,52.252356,21.034443,4.7,79
9,Teatr Bajka,teatr komediowy,Natalia Pietkiewicz-Szwarc,"Śródmieście, ul. Marszałkowska 138",www.teatrbajka.pl,52.235864,21.008624,4.8,2114


In [268]:
def db_gen(lat, lon, radius = 500, limit = 100):
  
    client_id = 'W0NOIF0PC5SPINJZ2Q2SAK2KGTN13GLN4BCGVETVLI5L152T'
    client_secret = 'HA2T3EADZZ3CUDB33ZNZL2DCC5CI1KXQE4H3ZVCBIZXHKDTF'
    ver = '20190421'
    category = "4bf58dd8d48988d17f941735"
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&category={}'.format(
    client_id, client_secret, ver, lat, lon, radius, limit, category)


    results= requests.get(url).json()
    response = results['response']
    items = response['groups'][0]['items'][0]['venue']
    response['groups'][0]['items'][0]['venue']

    len(response['groups'][0]['items'])

    items = response['groups'][0]['items']
    lst=[]
    for item in items:
        venue = item['venue']
        try:
            name = venue['name']
            ID = venue['id']
            lat = venue['location']['lat']
            lon = venue['location']['lng']
            city = venue['location']['city']
            adress = venue['location']['address']
            kind = venue['categories'][0]['name']
            short_kind = venue['categories'][0]['shortName']
        except KeyError:
            pass


        lst.append([name, ID, lat, lon, city, adress, kind, short_kind])

    df = pd.DataFrame(lst)
    df.columns=['name', 'id', 'lat', 'lon', 'city', 'address', 'kind', 'short_kind']
    return (df)

In [285]:
client_id = 'W0NOIF0PC5SPINJZ2Q2SAK2KGTN13GLN4BCGVETVLI5L152T'
client_secret = 'HA2T3EADZZ3CUDB33ZNZL2DCC5CI1KXQE4H3ZVCBIZXHKDTF'
ver = '20190421'
category = "4bf58dd8d48988d17f941735"
lat = 52.276753
lon = 20.948055
radius=500
limit=100
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
client_id, client_secret, ver, lat, lon, radius, limit)


results= requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ccf4aae6a607121248d0c0b'},
 'response': {'headerLocation': 'Słodowiec',
  'headerFullLocation': 'Słodowiec, Warsaw',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 5,
  'suggestedBounds': {'ne': {'lat': 52.2812530045, 'lng': 20.955396042779643},
   'sw': {'lat': 52.2722529955, 'lng': 20.940713957220357}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '557ef6c3498e07d3e3bb30de',
       'name': 'WakeCup Cafe',
       'location': {'address': 'Al. Zjednoczenia 1',
        'crossStreet': 'Żeromskiego',
        'lat': 52.27706537130918,
        'lng': 20.94682918511807,
        'labeledLatLngs': [{'label': 'display',
          'lat': 52.27706537130918,
          'lng': 20.94682918511807}],
        'distance': 90,
       

In [209]:
warsaw = folium.Map(location = [lat_center, lon_center], zoom_start = 13)


for lat, lon, name in zip(t_full['lat'], t_full['lon'], t_full['teatr']):
    folium.CircleMarker([lat, lon], radius = 5, fill=True, fill_color='orange', 
                        color = 'red', popup=name).add_to(warsaw)
    
warsaw

In [211]:
db_gen(52.2373039, 21.0328462)

Unnamed: 0,name,id,lat,lon,city,address,kind,short_kind
0,Teatr Ateneum,4bf959fd4a67c92857ac26cf,52.237345,21.032777,Warszawa,Jaracza 2,Theater,Theater
1,Nadwiślański Świt,54ea2405498e0dbe7ccd6106,52.238024,21.032533,Warszawa,Wybrzeże Kościuszkowskie 31/33,Restaurant,Restaurant
2,Schody Nad Wisłą,4db5fc380437fa536a4d325e,52.23952,21.033649,Warszawa,Wybrzeże Kościuszkowskie 31/33,Restaurant,Restaurant
3,Mr. Pancake,4dad9714043718a63e35180a,52.23638,21.029831,Warszawa,Solec 50,Creperie,Creperie
4,The Cool Cat,5628fa77498e0749c5a9a9df,52.235178,21.032876,Warszawa,Solec 38,Asian Restaurant,Asian
5,Plac Zabaw nad Wisłą,5575a906498ed7be503ee8e1,52.239288,21.033807,Warszawa,Solec 38,Asian Restaurant,Asian
6,BarKa,518e7f3e498efd1f9f16e4ed,52.239456,21.034178,Warszawa,Bulwar Bohdana Grzymały-Siedleckiego,Other Nightlife,Nightlife
7,Kufle i Kapsle Powiśle,5b46fc5b9411f2002cadba82,52.236263,21.030284,Warszawa,Solec 46A,Pub,Pub
8,CREPE CAFE,4fc0a8ffe4b0b469de7df4a1,52.238013,21.028482,Warszawa,Dobra 19,Café,Café
9,Heritage Shop & Wine,50eb0956e4b0a5a082658641,52.237504,21.027489,Warszawa,Solec 117,Deli / Bodega,Deli / Bodega


In [212]:
venlist = []
for lat, lon, name in zip(t_full['lat'], t_full['lon'],t_full['teatr']):
    try: 
        venues = db_gen(lat, lon, radius = 500)
        print("OK " + name + " " + str(lat) + " " + str(lon))
        venlist.append({'name': name, 'venues': venues['short_kind'].value_counts()})
    except: pass; print("error " + name + " " + str(lat) + " " + str(lon)); venlist.append({'name': name, 'none': 0})

OK Mazowiecki Teatr Muzyczny Operetka 52.2767531 20.9480547
OK Nowy Teatr 52.2052251 21.0199031
OK Och-Teatr 52.214206 20.980168
OK Stara ProchOFFnia 52.2520335 21.0118501
OK Studio Buffo 52.2281469 21.026145
OK Studium Teatralne 52.2504595 21.0509099
OK Teatr 6. piętro 52.2317616 21.0061014
OK Teatr Ateneum 52.2373039 21.0328462
error Teatr Baj 52.2523563 21.034443
OK Teatr Bajka 52.2358644 21.0086243
OK Teatr „Capitol” 52.2412061 21.0030915
error Teatr Collegium Nobilium 52.2486241 21.0079363
OK Teatr Dramatyczny m.st. Warszawy 52.231311 21.007571
OK Teatr Druga Strefa 52.1862106 21.0054782
OK Teatr Guliwer 52.203143 21.019801
OK Teatr IMKA 52.2284013 21.0261534
OK Teatr Kamienica 52.2432744 20.9984949
OK Teatr Komedia 52.2689536 20.9786279
OK Teatr Kwadrat 52.2358644 21.0086243
OK Teatr Lalka 52.2328725 21.0065461
OK Teatr Małego Widza 52.249697 21.0139989
OK Teatr Montownia 52.2284013 21.0261534
OK Teatr Muzyczny „Roma” 52.227404 21.007815
OK Teatr na Woli 52.2290795 20.9704782
OK 

In [292]:
t1 = pd.DataFrame(venlist[1]['venues']).transpose().rename(index={"short_kind": venlist[1]['name']})
t2 = pd.DataFrame(venlist[2]['venues']).transpose().rename(index={"short_kind": venlist[2]['name']})

t1.append(t2, sort=True)
df = pd.DataFrame()
for theatre in venlist:
    try:
        temp = pd.DataFrame(theatre['venues']).transpose().rename(index={"short_kind": theatre['name']})
        df = df.append(temp)
    except:
        pass
    
for col in df:
    df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0)
    
df['venue_sum'] = df.sum(axis=1)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort)


In [308]:
import statsmodels.api as sm
y= merged['rating']
X = sm.add_constant(merged[['venue_sum']])
model2 = sm.OLS(y.astype(float), X.astype(float)).fit()
model2.summary()

0,1,2,3
Dep. Variable:,rating,R-squared:,0.029
Model:,OLS,Adj. R-squared:,0.004
Method:,Least Squares,F-statistic:,1.159
Date:,"Sun, 05 May 2019",Prob (F-statistic):,0.288
Time:,23:26:30,Log-Likelihood:,-44.981
No. Observations:,41,AIC:,93.96
Df Residuals:,39,BIC:,97.39
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,4.2221,0.234,18.006,0.000,3.748,4.696
venue_sum,0.0042,0.004,1.077,0.288,-0.004,0.012

0,1,2,3
Omnibus:,86.254,Durbin-Watson:,2.079
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1750.412
Skew:,-5.427,Prob(JB):,0.0
Kurtosis:,33.113,Cond. No.,121.0


In [309]:
import statsmodels.api as sm
y= merged['user_ratings_total']
X = sm.add_constant(merged[['venue_sum']])
model3 = sm.OLS(y.astype(float), X.astype(float)).fit()
model3.summary()

0,1,2,3
Dep. Variable:,user_ratings_total,R-squared:,0.069
Model:,OLS,Adj. R-squared:,0.045
Method:,Least Squares,F-statistic:,2.888
Date:,"Sun, 05 May 2019",Prob (F-statistic):,0.0972
Time:,23:27:46,Log-Likelihood:,-345.28
No. Observations:,41,AIC:,694.6
Df Residuals:,39,BIC:,698.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,323.9605,355.686,0.911,0.368,-395.482,1043.403
venue_sum,10.0808,5.932,1.699,0.097,-1.918,22.080

0,1,2,3
Omnibus:,29.958,Durbin-Watson:,1.856
Prob(Omnibus):,0.0,Jarque-Bera (JB):,59.458
Skew:,2.015,Prob(JB):,1.23e-13
Kurtosis:,7.308,Cond. No.,121.0


In [None]:
def collect_more(init_lat, init_lon, n=100):
    df1 = db_gen(init_lat, init_lon)
    
    i=0
    
    while (i < n): 
        i=i+1
        clear_output(wait=True)
        print(str(100*(i)/n), "%")
        try:
            sample = df1.sample(1)
            new_lat = sample['lat'].tolist()[0]
            new_lon = sample['lon'].tolist()[0]
            df2 = db_gen(new_lat, new_lon)
            df1 = df1.append(df2)
            df1.drop_duplicates('id', inplace=True)
        except: print("error, continue"); continue    
    return(df1)

In [186]:
lat_center, lon_center = 52.225665764, 21.003833318
places = collect_more(lat_center, lon_center, n=50)
places

100.0 %


Unnamed: 0,name,id,lat,lon,city,address,kind,short_kind
0,Hampton by Hilton Warsaw City Centre,53a45b1d498e35d5abaf3c5e,52.225347,21.004087,Warszawa,Wspólna 72,Hotel,Hotel
1,Bar Pacyfik,58ff9dd3b9ac3810e95fdbe2,52.224703,21.007501,Warszawa,Wspólna 72,Hotel,Hotel
2,Marriott Warsaw - Executive Lounge,4eda4e29be7be28337954c8c,52.227947,21.005343,Warszawa,Al. Jerozolimskie 65 / 79,Hotel Bar,Hotel Bar
3,Marriot Skybar,554ea454498e220a445844ca,52.227817,21.005551,Warszawa,Al. Jerozolimskie 65 / 79,Hotel Bar,Hotel Bar
4,Marriott Warsaw,4b6c32f3f964a52006292ce3,52.227688,21.004712,Warszawa,Al. Jerozolimskie 65/79,Hotel,Hotel
5,Mandala,4bb3ca5414cfd13a3b8e16ab,52.223658,21.007222,Warszawa,Emilii Plater 9/11,Indian Restaurant,Indian
6,Teatr Roma,4ca21ef554c8a1cd4af9ad4b,52.227756,21.007759,Warszawa,Nowogrodzka 49,Theater,Theater
7,Panorama Sky Bar,4c3787832c8020a193318900,52.227819,21.005557,Warszawa,Aleje Jerozolimskie 65/79,Hotel Bar,Hotel Bar
8,Katmandu,4c039d549a7920a14c17d079,52.226146,21.008356,Warszawa,Wspólna 65A,Indian Restaurant,Indian
9,Flambéeria,541dbe49498e5719ff9d20c2,52.224518,21.007545,Warszawa,Hoża 61,Bistro,Bistro


In [None]:
warsaw = folium.Map(location = [lat_center, lon_center], zoom_start = 13)


for lat, lon, short_kind in zip(places['lat'], places['lon'], places['short_kind']):
    folium.CircleMarker([lat, lon], radius = 5, fill=True, fill_color='orange', 
                        color = 'red', popup=short_kind).add_to(warsaw)
    
warsaw