### <font color='orange'>General feedback</font>
Thank you for sending your project.  It's clear that a lot of work has been put into it. Excellent job, you have truly mastered the plotly library. Unfortunately, I can't finish my review because of a code error, please, fix this issue.

   
<div class="alert alert-info"> Hello, Grigoriy!
I tried to check the functionality of the code on the platform, and I had to comment out part of the code. Since the current version, PLOTY does not support such methods as "add_shape" and "add_annotation". Because of this, some graphs have become less informative. <be /> Unfortunately, I was unable to locate all vulnerabilities and you were unable to run the code. (((
I corrected everything. I look forward to your review.</div>

**Update**:<br>
Thank you for the  update! Well, at least I've seen them in your presentation :) I'm glad to say that you did very good job, your project has been obviously accepted. Good luck in the next sprint!

<div class="alert alert-warning">
<b>Reviewer's comment: </b> Additional links:
    <ul>
        <li>Barplot vs pie chart overview: <a>https://chartio.com/learn/charts/how-to-choose-pie-chart-vs-bar-chart/</a></li>
        <li>Pandas profilier: <a>https://github.com/pandas-profiling/pandas-profiling</a></li>
        <li>Top 50 matplotlib visualizations: <a>https://www.machinelearningplus.com/plots/top-50-matplotlib-visualizations-the-master-plots-python/</a></li>
    </ul>
</div>

---

### Market research of Los Angeles establishments.

This is work of Anton Rubenchik anton.rubenchik@gmail.com<br />
Presentation: <br />
https://github.com/rubenchick/Data_Analyst/blob/main/Yandex.Practicum/Project%209/Robo%20cafe.pdf

<div class="alert alert-success">
    <b>Reviewer's comment v2: </b> Brilliant job, thank you for the opportunity to see your plots without platform limitations :)
</div>

##### Table of contents
0. [Introduction](#introduction)
1. [Download the data and prepare it for analysis.](#prepare)<br />
1.1 [Loading data in an optimized form.](#loading_data)<br />
1.2. [Data preprocessing](#data_preprocessing)<br />
2. [Data analysis](#data_analysis)<br />
2.1 [Investigate the proportions of the various types of establishments.](#proportions_types)<br />
2.2 [The proportions of chain and nonchain establishments.](#chain_nonchain)<br />
2.3 [Proportions of chain and non-chain establishments with a breakdown by type.](#chain_nonchain_type)<br />
2.4 [Features of chain establishments.](#features_of_chain)<br />
2.5 [Determine the average number of seats for each type of restaurant.](#average_seats)<br />
2.6 [Put the data on street names in a separate column.](#street_column)<br />
2.7 [Top ten streets for the number of restaurants.](#top10_street)<br />
2.8 [Streets with only one restaurant.](#street_one)<br />
2.9 [Analysis of the distribution of seats in restaurants on streets with a large number of restaurants.](#distribution_of_seats)<br />
3. [Overall conclusion.](#overall_conclusion)




#### 0. Introduction<a name="introduction"></a><br />
My partners and I decided to open a small cafe with robots in Los Angeles. The project is promising but expensive, so we decided to try to attract investors. They are interested in current market conditions - can we continue to succeed when the novelty of the robotic waiters disappears?<br /><br />
To prepare a presentation for investors, I have to do market research. I have open-source data about restaurants in Los Angeles.

#### Description of the data <br />

Dataset "rest_data_us.csv"
- object_name — establishment name
- chain — chain establishment (TRUE/FALSE)
- object_type — establishment type
- address — address
- number — number of seats

<div class="alert alert-success">
<b>Reviewer's comment:</b> Nice introduction! Great that you have added so detailed interactive table of content, it's very useful tool for the project navigation.
</div>

#### Step 1. Download the data and prepare it for analysis.<a name="prepare"></a>

In [1]:
# We will extract the street from the address, so we need to install this module
%pip install -q usaddress

Note: you may need to restart the kernel to use updated packages.


<div class="alert alert-warning">
<b>Reviewer's comment: </b> You can hide pip  output with the -q key: <code>!pip install -q usaddress</code>
</div>

In [2]:
import pandas as pd
from scipy import stats as st
import numpy as np
import math
import usaddress
# #graph
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns
from plotly import graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.figure_factory as ff

color_dict = ['#00b894','#fdcb6e','#d63031','#0984e3','#e84393','#6c5ce7']
pd.options.display.float_format = "{:.2f}".format

<div class="alert alert-success">
<b>Reviewer's comment: </b> 👍
</div>

##### Preliminary analysis of the data structure.

In [3]:
rest_path   = 'rest_data_us.csv'
platform_path = 'https://code.s3.yandex.net/datasets/'

try:
    rest   = pd.read_csv(rest_path, nrows=500)
except:
    rest  = pd.read_csv(platform_path+rest_path, nrows=500)


In [4]:
print('Restaurant: \n')
print(rest.info())

Restaurant: 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   id           500 non-null    int64 
 1   object_name  500 non-null    object
 2   address      500 non-null    object
 3   chain        500 non-null    bool  
 4   object_type  500 non-null    object
 5   number       500 non-null    int64 
dtypes: bool(1), int64(2), object(3)
memory usage: 20.1+ KB
None


In [5]:
rest[:5]

Unnamed: 0,id,object_name,address,chain,object_type,number
0,11786,HABITAT COFFEE SHOP,3708 N EAGLE ROCK BLVD,False,Cafe,26
1,11787,REILLY'S,100 WORLD WAY # 120,False,Restaurant,9
2,11788,STREET CHURROS,6801 HOLLYWOOD BLVD # 253,False,Fast Food,20
3,11789,TRINITI ECHO PARK,1814 W SUNSET BLVD,False,Restaurant,22
4,11790,POLLEN,2100 ECHO PARK AVE,False,Restaurant,20


In [6]:
rest['object_type'].value_counts()

object_type
Restaurant    378
Fast Food      61
Bakery         19
Bar            15
Pizza          14
Cafe           13
Name: count, dtype: int64

We need to convert the "object_type" field to the category type.

##### 1.1 Loading data in an optimized form.<a name="loading_data"></a>

In [7]:
try:
    rest   = pd.read_csv(rest_path,  
                         dtype={'object_type': 'category'})
except:
    rest  = pd.read_csv(platform_path + rest_path,  
                        dtype={'object_type': 'category'})

#### 1.2. Data preprocessing<a name="data_preprocessing"></a>

In [8]:
rest.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9651 entries, 0 to 9650
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype   
---  ------       --------------  -----   
 0   id           9651 non-null   int64   
 1   object_name  9651 non-null   object  
 2   address      9651 non-null   object  
 3   chain        9648 non-null   object  
 4   object_type  9651 non-null   category
 5   number       9651 non-null   int64   
dtypes: category(1), int64(2), object(3)
memory usage: 386.8+ KB


In [9]:
rest = rest.rename(columns = {'object_name':'name','object_type':'type'})

##### 1.2.1 Column  "id"

In [10]:
print('Duplicates found - ',rest[['id']].duplicated().sum(),' records.')

Duplicates found -  0  records.


##### 1.2.2 Columns  "name" and "address"

In [11]:
def create_clear_column(df, cols):
    for col in cols:
        df[col+'_clean'] = df[col].str.lower()
        df[col+'_clean'] = df[col+'_clean'].replace('[^a-zA-Z0-9 ]', '', regex=True)
    return df
cols = ['name', 'address']
# Let's remove the extra characters in the title and address
rest = create_clear_column(rest, cols)
#Finding dublicates
print('Duplicates found - ',rest[['name_clean','address_clean']].duplicated().sum(),' records.')

Duplicates found -  29  records.


We see that we have 29 takes, we must delete them

In [12]:
rest = rest.sort_values(by = 'id', ascending = True)
rest.drop_duplicates(subset=['name_clean','address_clean'], keep= 'last', inplace=True)

In [13]:
print('Duplicates found - ',rest[['name_clean','address_clean']].duplicated().sum(),' records.')

Duplicates found -  0  records.


##### 1.2.3 Column  "chain"

It is immediately striking that there are no data in the "chain" column for three records. Let's restore them.

In [14]:
def fill_nan(selection):    
    return len(selection) > 1

rest['chain'] = rest['chain'].fillna(
    rest.
    groupby(['name'])['chain'].
    transform(fill_nan)
)
rest.info()

<class 'pandas.core.frame.DataFrame'>
Index: 9622 entries, 0 to 9650
Data columns (total 8 columns):
 #   Column         Non-Null Count  Dtype   
---  ------         --------------  -----   
 0   id             9622 non-null   int64   
 1   name           9622 non-null   object  
 2   address        9622 non-null   object  
 3   chain          9622 non-null   bool    
 4   type           9622 non-null   category
 5   number         9622 non-null   int64   
 6   name_clean     9622 non-null   object  
 7   address_clean  9622 non-null   object  
dtypes: bool(1), category(1), int64(2), object(4)
memory usage: 545.2+ KB


  rest['chain'] = rest['chain'].fillna(


##### 1.2.4 Column  "type"

In [15]:
rest['type'].value_counts()


type
Restaurant    7235
Fast Food     1063
Cafe           434
Pizza          315
Bar            292
Bakery         283
Name: count, dtype: int64

##### 1.2.5 Column  "number"

In [16]:
rest_with_small_number_seats = ( 
    rest.
    pivot_table(index = 'number', values = 'id', aggfunc = 'count').reset_index()
)
rest_with_small_number_seats[:5]

Unnamed: 0,number,id
0,1,186
1,2,177
2,3,191
3,4,173
4,5,197


No missing or zero values were found in the "number" column. Approximately 10% (924) of establishments have less than 6 seats. Most likely, these are establishments specializing in street trading, selling to-go.

<div class="alert alert-warning">
<b>Reviewer's comment: </b> Well done, you've covered all necessary data preparation steps. In this reserach it's also reasonable to check the dataframe for the partly duplicates, without id and number of seats. Sometimes data could be collected from different sources and some rows could contradict each other.
</div>

### Step 2. Data analysis<a name="data_analysis"></a>

#### 2.1 Investigate the proportions of the various types of establishments. <a name="proportions_types"></a>

In [17]:
#Let's count the number of establishments of each type
rest_types = rest.groupby('type',as_index = False).agg({'id':'count'})
rest_types = rest_types.rename(columns = {'id':'count'})
rest_types

  rest_types = rest.groupby('type',as_index = False).agg({'id':'count'})


Unnamed: 0,type,count
0,Bakery,283
1,Bar,292
2,Cafe,434
3,Fast Food,1063
4,Pizza,315
5,Restaurant,7235


In [18]:
#Let's build a graph
name_rest = rest_types['type']
values = rest_types['count']
fig = go.Figure(data=[go.Pie(labels=name_rest, 
                             values=values)]
               )
fig.update_traces(
    hoverinfo='label+value',
    textinfo='percent',
    textfont_size=18,
    marker=dict(
        colors = color_dict,
        line = dict(
            color='#000000',
            width=2)
    )
)
fig.update_layout(
    title = dict(
        text = 'Types of establishments',
        font = dict(
            family="Balto",
            size = 24,
            color = "black"),
        x = 0.46,
        xanchor = "center",
        yanchor = "middle"
    ),
    font_family="Courier New",
    font_size = 20 #legenda
)

fig.show()


ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

##### Conclusion: <br />
As we can see, more than **75%** of all establishments are restaurants. Fast Food ranks second in popularity with a share of **11%**, other types of establishments have approximately the same share.

<div class="alert alert-success">
<b>Reviewer's comment: </b> Very nice styling!
</div>

#### 2.2 The proportions of chain and nonchain establishments. <a name="chain_nonchain"></a>

In [None]:
#Let's count the number of chain and non-chain establishments
rest_chain = ( 
    rest.groupby('chain',as_index = False).
    agg({'id':'count'}).
    sort_values(by = 'chain', ascending = False)
)
rest_chain = rest_chain.rename(columns = {'id':'count'})
rest_chain

Unnamed: 0,chain,count
1,True,3657
0,False,5965


In [None]:
#Let's build a graph
labels = ['Chain', 'Nonchain']
values = rest_chain['count']
fig = go.Figure(data=[go.Pie(labels=labels, values=values)])
fig.update_traces(
    hoverinfo='label+value',
    textinfo='percent',
    textfont_size=18,
    marker=dict(
        colors = color_dict,
        line = dict(
            color='#000000',
            width=2)
    )
)
fig.update_layout(
    title = dict(
        text = 'Proportions of chain and non-chain establishments',
        font = dict(
            family="Balto",
            size = 24,
            color = "black"),
        x = 0.46,
        xanchor = "center",
        yanchor = "middle"
    ),
    font_family="Courier New",
    font_size = 20 #legenda
)

fig.show()

##### Conclusion <br />
As we can see, more than 62% of all establishments are not a chain.

<div class="alert alert-success">
<b>Reviewer's comment: </b> Yes, you are right.
</div>

#### 2.3 Proportions of chain and non-chain establishments with a breakdown by type. <a name="chain_nonchain_type"></a>

In [None]:
rest_type_dict = list(rest['type'].unique())

In [None]:
def slice_rest(rest_type3):
    df = rest[rest.type == rest_type3].groupby('chain',as_index = False).count()[['chain','id']]
    df['ratio'] = df['id'] / df['id'].sum()
    return df.sort_values(by= 'chain', ascending = False)['ratio']

In [None]:
def add_pie(rest_type2,row,column):
    fig.add_trace(go.Pie(labels=labels, 
                         values=slice_rest(rest_type2), 
                         name='rest_type2',
                         marker_colors=color_dict, 
                         title = rest_type2), row, column)

In [None]:
# Create subplots, using 'domain' type for pie charts
specs = [[{'type':'domain'}, {'type':'domain'}, {'type':'domain'}],
         [{'type':'domain'}, {'type':'domain'}, {'type':'domain'}]]

fig = make_subplots(rows=2, cols=3, specs=specs)

# Plot a pie chart for each category
for idx, item in enumerate(rest_type_dict):
    number = idx + 1
    add_pie(item,idx//3+1,idx%3+1)
# Tune layout and hover info
fig.update(layout_title_text='The proportions of chain and nonchain establishments.',
           layout_showlegend=True,layout_font_size = 20 )

fig.update_traces(
    hoverinfo='label',
    textinfo='percent',
    textfont_size=16,
    marker=dict(
        colors = color_dict,
        line = dict(
            color='#000000',
            width=2)
    )
)

fig = go.Figure(fig)
fig.show()

##### Conclusion<br />
As we learned earlier, more than 62% of all establishments are not a chain. If we look in more detail, then depending on the type of establishment, the share of chain establishment is different. Such types of establishments as **Fast Food**, **Cafe**, **Bakery** are usually chain. **Bar** and **Restaurant** are mostly non-chain. **Pizza** are approximately equally divided between chain and non-chain.

<div class="alert alert-success">
<b>Reviewer's comment: </b> Excelllent plot grid!
</div>

#### 2.4 Features of chain establishments.<a name="features_of_chain"></a>

In [None]:
rest_chain_number_mean = round(rest[rest['chain'] == True]['number'].mean(),0)
rest_nonchain_number_mean = round(rest[rest['chain'] == False]['number'].mean(),0)
print('The average number of places in chain establishments is {:}'.format(rest_chain_number_mean))
print('The average number of places in nonchain establishments is {:}'.format(rest_nonchain_number_mean))

The average number of places in chain establishments is 40.0
The average number of places in nonchain establishments is 46.0


In [None]:
fig = px.histogram(rest[rest['chain'] == True],
                   x="number",
                   color = 'type', 
                   title="Distribution of the number of seats in chain establishments.",
                   barmode = "stack",
                   color_discrete_sequence=color_dict,
                   nbins = 65,
                   labels = {'type': 'Type:'}
                  )
fig.update_layout(yaxis_title="Number of Establishments",xaxis_title="Number of Seats")

## Unfortunately, the current version of PLOTY does not support meths add_annotation and add_shape((

# fig.add_shape(
#     go.layout.Shape(type='line', xref='x', yref='y',
#                     x0=rest_chain_number_mean, 
#                     y0=0, 
#                     x1=rest_chain_number_mean, 
#                     y1=400, line={'dash': 'dash'}),
#     row=1, col=1
# )
# fig.add_annotation(x=rest_chain_number_mean, y=380,
#                    text="Mean: " + str(int(rest_chain_number_mean)),
#                    showarrow=True, 
#                    yshift=20,
#                    align="center",
#                    arrowhead=2,
#                    arrowsize=1,
#                    arrowwidth=2,
#                    arrowcolor="#636363",
#                    ax=20,
#                    ay=-30,
#                    bordercolor="#c7c7c7",
#                    borderwidth=2,
#                    borderpad=4,
#                    bgcolor="#ff7f0e",
#                    opacity=0.8
#                   )

fig.show()

<div class="alert alert-warning">
<b>Reviewer's comment: </b> Well done. Please, pay attnetion, that in this case scatter plot (chain size / seats amount) would be more helpful.
</div>

##### Conclusion<br />
As can we see many more chain establishments with a little number of seats. For such types of establishments as **"Bakery"**, **"Cafe"**, **"Pizza"** the share of establishments with a large number of seats is very small (less than 5%). <br /><br />
We also discovered the surprising fact that there are no chain establishments with a capacity of **49 - 59** places. Perhaps this is due to the peculiarities of taxation in Los Angeles, but unfortunately, I could not find an article answering as to why there are no establishments with a capacity of **49-59** seats. <br /><br />
We also found out that the average number of seats in chain establishments is less than in non-chain establishments.


##### Share of chain establishments with less than 50 seats.

In [None]:
#In column "number_less_50" we will indicate more than 50 seats in the institution or not
rest_seat = rest[rest.chain == True].reset_index()
rest_seat['number_less_50'] = rest_seat.number < 50
rest_seat['number_less_50'] = (
    rest_seat['number_less_50'].map({True: 'less than 50', False: 'more than 50'})
)

In [None]:
rest_seat_label = (
    rest_seat[rest_seat.number < 50].groupby('type').agg({'id':'count'}) /
    rest_seat.groupby('type').agg({'id':'count'}) 
)
rest_seat_label = rest_seat_label.reset_index()

In [None]:
# rest_seat_label = [82,79,95,77,93,97]

<div class="alert alert-danger">
<s><b>Reviewer's comment:</b> Please, don't use hardcoded values.
</div>

In [None]:
#Let's build a graph
def add_label(number,value):
     fig.add_annotation(x=number, y=value-6,
                   text="<b>"+str(value)+"%</b>" ,
                   showarrow=False, 
                   font = dict(
                       family="Balto",
                       size = 16,
                       color = "white"),
                  )

fig = px.histogram(rest_seat,
                   x="type",
                   color = 'number_less_50',
                   title="Share of establishments with less than 50 seats",
                   barmode = "stack",
                   barnorm = "percent",
                   color_discrete_sequence= color_dict, 
                   labels = {'number_less_50': 'The number of seats:'}
                  )
fig.update_layout(yaxis_title="%",xaxis_title="")

## Unfortunately, the current version of PLOTY does not support meths add_annotation ((

# for idx, value in enumerate(rest_seat_label):
#     add_label(idx,value)

fig.show()



##### Conclusion <br />
As we assumed earlier, in the establishments: **"Bakery"**, **"Cafe"**, **"Pizza"** the share of establishments with a large number of seats is very small (less than **5%**). And in **"Fast Food"**, **"Restaurant"** and **"Bar"** the share of establishments with a large number of seats is more than **20%**.

#### 2.5 Determine the average number of seats for each type of restaurant. <a name="average_seats"></a>

In [None]:
rest_seat_mean = rest.groupby('type', as_index = False).agg({'number':'mean'})

<div class="alert alert-danger">
<s><b>Reviewer's comment:</b> ☠️
</div>


In [None]:
fig = go.Figure()
fig.add_trace(go.Bar(x=rest_seat_mean.type,
                     y=rest_seat_mean.number,
                     text=round(rest_seat_mean.number,0),
                     textsrc='%{text:.1s}',
#                      texttemplate='%{text:.2s}',
                     textposition='outside',
                     textfont_size=14,
                     marker=dict(
                         color='rgba(50, 171, 96, 0.6)',
                         line=dict(
                             color='rgba(50, 171, 96, 1.0)',
                             width=1),
                     )
                    ))

fig.update_layout(
    title='The average number of seats for each type of establishments.',
    xaxis_tickfont_size=14,
    yaxis=dict(
        title='Number of Seats',
        titlefont_size=16,
        tickfont_size=14,
    ),
    bargroupgap=0.1 # gap between bars of the same location coordinate.
)
fig.show()

##### Conclusion<br /> 
As we can see, Restauratns and Bars have the largest average number of seats (48 and 45, respectively), while Cafes and Bakeries have the lowest average for this value.

<div class="alert alert-success">
<b>Reviewer's comment: </b> Another one great plot :) Yes, you are right!
</div>

#### 2.6 Put the data on street names in a separate column.<a name="street_column"></a>

In [None]:
def extract_street(raw):
    # When I tested this feature, I found out that the street called "olvera" 
    # is not converting correctly. Let's create an exception.
    if raw.startswith('OLVERA'):
        return 'OLVERA'
    elif raw.startswith('1033 1/2 LOS ANGELES'):
        return 'LOS ANGELES'
    else:
        raw_address=usaddress.parse(raw)
        dict_address={}
        for i in raw_address:
            if (i[1] == 'StreetName') and ('StreetName' in dict_address.keys()):
                text = dict_address[i[1]] + ' ' + i[0] 
                dict_address.update({i[1]:text})            
            else:
                dict_address.update({i[1]:i[0]})
        #this line below checks for normal case with street and number    
        if 'StreetName' in dict_address.keys() :
            street = dict_address['StreetName']
            return street
        else:
            return 'no street'


In [None]:
rest['street']=rest.address.apply(extract_street)

##### Conclusion<br />

As we can see, we have extracted the street from the address and created a new column with these values.

#### 2.7 Top ten streets for the number of restaurants.<a name="top10_street"></a>

In [None]:
rest['street'] = rest.apply(lambda x: x['street'].lower(), axis=1)
#Let's count the number of establishments on each street
number_rest_street = rest.groupby('street',as_index = False).agg({'id':'count'})
top_ten_street = number_rest_street.sort_values(by = 'id', ascending = False)[:10]
top_ten_street['street'] = top_ten_street.apply(lambda x: x['street'].capitalize(), axis=1)

In [None]:
#Let's build a graph
fig = go.Figure()
fig.add_trace(go.Bar(x=top_ten_street.id,
                     y=top_ten_street.street,
                     text=top_ten_street.id,
                     orientation='h',
#                      texttemplate='%{text:.3s}',
                     textsrc='%{text:.1s}',
                     textposition='outside',
                     textfont_size=14,
                     marker=dict(
                         color='rgba(50, 171, 96, 0.6)',
                         line=dict(
                             color='rgba(50, 171, 96, 1.0)',
                             width=1),
                     )
                    ))
## Unfortunately, the current version of PLOTY does not support meths update_traces ((
# fig.update_traces(
#     texttemplate='%{text:.3s}',
#     textposition='outside',
#     textfont_size=14)

fig.update_layout(
    title='Top ten streets for the number of establishments',
    xaxis=dict(
        title='Number of establishments',
        titlefont_size=16,
        tickfont_size=14,
    ),
    yaxis=dict(
        title='Street',
        titlefont_size=16,
        tickfont_size=14,
        autorange="reversed"
    ),
    bargroupgap=0.1 
)
fig.show()
                 

##### Conclusion<br />
As you can see, the most popular street is Sunset.


<div class="alert alert-warning">
<b>Reviewer's comment: </b> Well done! What  do these streets have common?
</div>

#### 2.8 Streets with only one restaurant.<a name="street_one"></a>

In [None]:
one_rest_street_list = list(number_rest_street[number_rest_street.id == 1]['street'])
print('There is only one restaurant on {:} streets.'.format(len(one_rest_street_list)))

There is only one restaurant on 197 streets.


In [None]:
one_rest_street = rest.query('street in @one_rest_street_list')
one_rest_street = one_rest_street.groupby('type',as_index = False).agg({'id':'count'})

In [None]:
# Create subplots, using 'domain' type for pie charts
specs = [[{'type':'domain'}, {'type':'domain'}]]
fig = make_subplots(rows=1, cols=2, specs=specs)

fig.add_trace(go.Pie(labels=one_rest_street['type'], 
                     values=one_rest_street['id'], 
                     name='rest_type2',
                     marker_colors=color_dict, 
                     title = "Street with one establishment", 
                     rotation = 290), 1, 1)
fig.add_trace(go.Pie(labels=rest_types['type'], 
                     values=rest_types['count'], 
                     name='rest_type2',
                     marker_colors=color_dict, 
                     title = "All streets", 
                     rotation = 305), 1, 2)

# Tune layout and hover info
fig.update(layout_title_text='The proportions of the types of establishments',
           layout_showlegend=True,layout_font_size = 20 )

fig.update_traces(
    hoverinfo='label',
    textinfo='percent',
    textfont_size=16,
    marker=dict(
        colors = color_dict,
        line = dict(
            color='#000000',
            width=2)
    )
)

fig = go.Figure(fig)
fig.show()

##### Conclusion<br />
As we learned, there is only one establishment on **197** streets. The likelihood that this establishment is a restaurant has increased.

<div class="alert alert-success">
<b>Reviewer's comment: </b> 🔥🔥🔥
</div>

#### 2.9 Analysis of the distribution of seats in restaurants on streets with a large number of restaurants.<a name="distribution_of_seats"></a>

In [None]:
#create a list with top10 streets
top_ten_street['street'] = top_ten_street.apply(lambda x: x['street'].lower(), axis=1)
rest_top10_street_list = list(top_ten_street['street'])

In [None]:
# add column top10, True - if the restaurant is on the street top10
rest['top10'] = False
rest['top10'] = rest['street'].apply(lambda x: x in rest_top10_street_list)

In [None]:
# Calculate the average number of seats of an establishment in the top 10 and others
rest_top10_number_mean = round(rest[rest.top10 == True]['number'].mean(),0)
rest_nottop10_number_mean = round(rest[rest.top10 == False]['number'].mean(),0)
print('The average number of places in Top10 establishments is {:}'.format(rest_top10_number_mean))
print('The average number of places in not Top10 establishments is {:}'.format(rest_nottop10_number_mean))

The average number of places in Top10 establishments is 46.0
The average number of places in not Top10 establishments is 43.0


In [None]:
#Let's build a graph
fig = px.histogram(rest[rest['top10'] == True],
                   x="number",
                   color = 'type', 
                   title="Distribution of the number of seats in establishments on the streets of Top10.",
                   barmode = "stack",
                   color_discrete_sequence=color_dict,
                   nbins = 65,
                   labels = {'type': 'Type:'}
                  )
fig.update_layout(yaxis_title="Number of Establishments",xaxis_title="Number of Seats")

## Unfortunately, the current version of PLOTY does not support meths add_shape and  add_annotation ((

# fig.add_shape(
#     go.layout.Shape(type='line', xref='x', yref='y',
#                     x0=rest_top10_number_mean, 
#                     y0=0, 
#                     x1=rest_top10_number_mean, 
#                     y1=400, line={'dash': 'dash'}),
#     row=1, col=1
# )
# fig.add_annotation(x=rest_top10_number_mean, y=380,
#                    text="Mean: " + str(int(rest_top10_number_mean)),
#                    showarrow=True, 
#                    yshift=20,
#                    align="center",
#                    arrowhead=2,
#                    arrowsize=1,
#                    arrowwidth=2,
#                    arrowcolor="#636363",
#                    ax=20,
#                    ay=-30,
#                    bordercolor="#c7c7c7",
#                    borderwidth=2,
#                    borderpad=4,
#                    bgcolor="#ff7f0e",
#                    opacity=0.8
#                   )

fig.show()

We calculated the average number of seats on the central streets. It is 3 more than not central. Let's build a density graph and see if this is true.

In [None]:
hist_data = [rest[rest['top10'] == True]['number'], rest[rest['top10'] == False]['number']]

group_labels = ['Top10 streets', 'Other streets']
colors = ['#0984e3',  '#fa8231']

# Create distplot with curve_type set to 'normal'
fig = ff.create_distplot(hist_data, group_labels,
                         colors=colors,
                         bin_size=1, show_rug=False)
# Add title
fig.update_layout(title_text='Distribution of the number of seats in establishments on the streets of Top10 and others')
fig.update_layout(yaxis_title="Density",xaxis_title="Number of Seats")
fig.show()

##### Conclusion<br />
As we can see, the most popular streets have a smaller proportion of establishments with a small number of seats, which is confirmed by the graph. Thus, we have confirmed graphically the previously obtained data that the average number of seats on the most popular streets is greater than on other streets.<br /><br />
Perhaps this is due to the fact that there are more customers on popular streets and restaurants can receive more guests than establishments on other streets.

<div class="alert alert-success">
<b>Reviewer's comment: </b> Yep, very good point!
</div>

##### Let's find out if the proportions of the types of establishments on central streets and on the rest are different.

In [None]:
rest_share_top10 = (
    rest[rest['top10'] == True].groupby('type').agg({'id':'count'})
    / 
    len(rest[rest['top10'] == True])
)
rest_share_top10 = rest_share_top10.reset_index()

rest_share = (
    rest.groupby('type').agg({'id':'count'})
    / 
    len(rest)
)

rest_share = rest_share.reset_index()

In [None]:
fig = go.Figure()
fig.add_trace(go.Bar(x=rest_share_top10.type,
                     y=rest_share_top10.id,
                     text=round(rest_share_top10.id,2)*100,
                     name = "Top10 streets",
#                      marker_color= color_dict[0]
                     textsrc='%{text:.1s}',
#                      texttemplate='%{text:.1%}',
                     textposition='outside',
                     textfont_size=14,
                     marker=dict(
                         color='rgba(50, 171, 96, 0.6)',
                         line=dict(
                             color='rgba(50, 171, 96,  1.0)',
                             width=1),
                     )
                    ))
fig.add_trace(go.Bar(x=rest_share.type,
                     y=rest_share.id,
                     text=round(rest_share.id,2)*100,
                     name = "Other streets",
                     textsrc='%{text:.1%}',
#                      texttemplate='%{text:.1%}',
                     textposition='outside',
                     textfont_size=14,
                     marker=dict(
                         color='rgba(249, 202, 36, 0.6)',
                         line=dict(
                             color='rgba(249, 202, 36, 1.0)',
                             width=1),
                     )
                    ))


fig.update_layout(
    title='The share of different types of establishments on the streets of Top10 and others',
    xaxis_tickfont_size=14,
    yaxis=dict(
        title='Share of establishments',
        titlefont_size=16,
        tickfont_size=14,
    ),
    bargroupgap=0.1 # gap between bars of the same location coordinate.
)
fig.show()

##### Conclusion<br />
As we can see, the share of different types of establishments practically does not differ from whether the establishment is located on a popular street or not. The only thing we see is a decrease in the share of **Fast Food** by **1.5%** and an increase in the share of **Restaurants** by **2%**. <br /><br />
Thus we do not see any fundamental difference in the distribution of types of establishments between streets.

### 3 Overall conclusion.<a name="overall_conclusion"></a>

After analyzing the establishments in Los Angeles, we found out the following: <br />

1. There are **9651** establishments in Los Angeles.
2. More than **75%** of all establishments in the city are restaurants, and only **4.5%** are cafes.
3. **38%** of establishments are chain. <br />Opening a chain establishment has a number of advantages:
 - Reduced costs for marketing, purchasing and salaries of management personnel.
 - Less risk when opening, as there is already a successful and stable business model.
4. In chain establishments the number of seats is less than in non-chain ones (on average 40 versus 46). I suppose that this is due to the fact that there are many chain establishments, and the clients of this restaurant do not come from all over the city to one place but visit the nearest establishment.<br /> This leads to savings not only on management personnel, costs but also on rent.
5. More than 97% of cafes have a capacity of less than 50 seats.
6. The average number of seats in a cafe is 25. The average number of seats in a restaurant is 48.
7. There are no establishments with a capacity of 49-59 people, I guess this is due to taxation.
8. On the central streets in the establishments there are more seats (on average 46) than on the non-central ones (on average 43), even though the rent on these streets is much more expensive.<br /> This suggests that customer traffic on the central streets is stable.<br /> <br /> 
Whatever the quality cuisine or the concept with the perfect design, unfortunately, this does not guarantee good attendance for the restaurant. A restaurant needs traffic that strongly depends on the location, so the key to success for most establishments is renting premises on streets with good pedestrian and car traffic. Few people - no profit.

#### Recommendations:<br /><br />
We have a great concept for our establishment. This will attract customers to us, including tourists, since the use of robots in a restaurant is quite rare. We invest heavily in investments, so I would advise us to rent a place for a cafe on a street with high traffic.
<br /><br />
I would advise us to consider changing the type of establishment from a cafe to a restaurant. Since, on average, restaurants are more in demand among customers (75% versus 4.5%), restaurants have more seats (48 versus 26).
<br /><br />
If we successfully launch our establishment and the business model will consistently bring money, then the next stage in the development of this business, I propose to consider the creation of a chain of establishments, possibly the sale of a franchise.

<div class="alert alert-success">
<b>Reviewer's comment: </b> Excellent final. Thank you so much for the research!
</div>