# Plotly

> Plotly is not merely an open-source library but also a company that is headquartered in Montreal, Quebec. An advantage of this library is that it is open-source licenced under Massachusetts Institute of Technology (MIT). It also offers a paid enterprise version with enhanced features and technical support.

> Plotly is considered to be the go-to library among the data science community to perform data exploration and analysis, as they find it more interactive and attractive. Plotly offers support for various programming languages, but its Python library is the most commonly used one for building charts.

The three elements inside a plot are as follows: 

- Data
- Frame and Layout
- Figure
 

For example, let us take the stock market of google company over a period of time. You can view the dataset in the below image. The attributes of the dataset are:

- Date - Day on which the stock price was recorded
- High - Highest stock price attained on that particular day
- Low - Lowest stock price attained on that particular day
- Open - Opening stock price on that particular day
- Close - Closing stock price on that particular day

![google stock](img/intro_google_stock.png)

When you display the closing stock price of the dataset as a simple line chart we get the following output.

![google stock](img/intro_google_stock_fig.png)

Now let us map the elements of the plotly chart in the image generated above.

- Trace - The blue line which represents the stock market of Google on various dates(X and Y axis) is called the trace. Trace is nothing but the data that is to be plotted on a graph.

- Data - The trace created is to be enclosed in an object which is called 'Data object'.

- Layout - The title of the graph, X-axis and Y-axis names are variables which belong to the layout object. The layout object functionality is to style the trace(data to be plotted) and give labels to the graph. 

- Figure - For users to view this graph, the data and layout objects have to be mapped inside a  container called figure( as explained in the video). 

> This figure container is passed to plotly Python API. This figure container is converted into a JSON format by the plotly Python API before being passed to the underlying Plotly.js library. The plotly Python API  is a wrapper built over the plotly.js JavaScript library. The Python API also allows you to view this JSON by entering the command 'fig.to_json() '.


Let's summarize the three elements of the Plotly graph.

### Data:

The first of the three top-level attributes of a figure is data, which contain traces. There can be multiple traces for a given chart. Below is diagrammatic representation of a data object containing two traces.

![intro_data](img/intro_data.png)

Below image is also another representation of the data object, you can see that multiple traces are encapsulated inside the data object.

![intro_data](img/intro_data_2.png)


### Layout and Frames:

The second-level attribute of a figure is layout, whose value contains attributes such as ‘Titles’, ‘Legends’, ‘Axes Labels’, ‘Templates’, ‘Fonts’, ‘Dimensions’, ‘margins’, etc. (you will take a look at each feature in the upcoming sessions). Frames are mostly used in animation charts, which are high-level visualisation charts and these will be covered in the upcoming sessions. Lets see the diagramatic representation of the layout object.
![intro_layout](img/intro_layout.png)


### Figure

Finally, a figure puts together the data and the layout into a displayable chart. It can be thought of as a container that is serialised into a JSON for rendering the chart. Lets see the diagramatic representation of the figure object.

![figure](img/intro_figure.png)

#### Comparison among various visualization libraries in python.
![comparison](img/comparison.png)

### Advantages of Plotly
- Interactivity: It allows users to interact with created charts. Assume that you have a time series plot for manufacturing steel metals, which is created with the help of Seaborn. You need to find the exact process at a particular time interval, wherein you need to zoom in and view carefully. With Seaborn, you need to plot all over again with the new inputs. However, Plotly can help in such scenarios (you will take a look at these in detail in the coding sections).

 

- Adaptability: Plotly also adapts to different programming languages and environments. It provides support for JavaScript, React, R, Julia, MATLAB and even Arduino.

 

- Variety: Different types of charts are available in this. A wide range of charts is available under the hood of Plotly libraries that span across all the domains.

 

- Scalability: The charts are scalable to changes in the range of the data, and they also allow users to perform statistical analysis over a range of data.

 

- User-friendly: Generating plots with plotly is quite easy and less time-consuming, especially with respect to visualising complex charts.

#### Plotly.graph_objs

Plotly.graph_objs is the library that is used for plotting graphs in plotly. The primary classes defined in the plotly.graph_objs module are figure and ipywidgets. A figure, as discussed in the previous segment, is a container consisting of data and layout elements. Ipywidgets is the library that is used to implement the dynamic aspects of plotly (will discuss in detail in later sessions).

#### Plotly Express 

Plotly has a complex syntax compared with Seaborn or Matplotlib. Plotly Express has made interactive plotting effortless through its simple functions. It is a wrapper built on top of Plotly.graph_objs. Statistical plotting became simpler when seaborn was introduced (which is built on top of matplotlib). Similarly, Plotly express made Plotly simple to use. Plotly Express is built such that interactive plotting is easily understood by all. It is the recommended entry point into the plotly library.

> Plotly.graph_objs

    import plotly.graph_objs as go
    fig = go.Figure(data=[
        go.histogram(name='GDP', x=gapminder2007["country"], y=gapminder2007["gdpPercap"]),     
    ])
    fig.update_layout(title_text='GDP of various countries',
                      xaxis=dict(
                                    title='Countries',
                                ),
                       yaxis=dict(
                                    title='GDP',
                                ),
                     )
    fig.show()

    
> Plotly Express

    import pandas as pd
    import plotly.express as px
    px.histogram(gapminder2007, x="gdpPercap",hover_name="country",marginal="box")

You can see that the code for plotly express is just about three lines. However, in this module, you will learn to plot graphs using plotly.graph_objs because this will make learning Plotly express even more simple, and most of all, you will understand what happens behind the Plotly express code under the plotly.js layer. So, as a programmer, you should not only know what you are plotting but also understand what is happening behind the scenes.


Plotly offers more than 13 different types of basic charts. Some of the most commonly used ones that you would have definitely come across in any plotting library and are used on a day-to-day basis are presented in the image given below
    
![basic charts](img/plotly_basic_charts.png)

Plotly offers 14 different types of statistical charts and 8 types of financial charts. In the image given below, charts that are widely used in this domain are presented.<br>

![statistical charts](img/statistical%20charts.png)

In [26]:
import plotly.graph_objects as go
import pandas as pd

## Bar Chart

In [27]:
#read the data from the file using pandas
df = pd.read_csv("data/Titanic_Dataset.csv")
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [28]:
#store the number of passsengers who have survived and did not survive in separate variables
survival_label = []
survival_label.append(df['Survived'].where(df['Survived']==1).count())
survival_label.append(df['Survived'].where(df['Survived']==0).count())

In [29]:
survival_label

[342, 549]

In [30]:
#create the figure container with data object. The trace is a bar chart
fig = go.Figure([go.Bar(
    x = ['Yes','No'],
    y = survival_label,
)])

# create the layout object
fig.update_layout(
    title_text = "Titanic dataset",
    xaxis = dict(
        title = "Survival Category"
    ),
    yaxis = dict(
        title = "No of Passengers"
    )
)

#plot the figure container
fig.show()

In [31]:
#store the number of male passengers who survived and did not in separate variables
survival_yes_cat = []
survival_yes_cat.append(df['Survived'].where((df['Survived']==1) & (df['Sex']=='male')).count())
survival_yes_cat.append(df['Survived'].where((df['Survived']==1) & (df['Sex']=='female')).count())

#store the number of female passengers who survived and did not in separate variables
survival_no_cat = []
survival_no_cat.append(df['Survived'].where((df['Survived']==0) & (df['Sex']=='male')).count())
survival_no_cat.append(df['Survived'].where((df['Survived']==0) & (df['Sex']=='female')).count())

In [32]:
#create the figure container with data object. The trace is a bar chart
fig = go.Figure(
    data = [
        go.Bar(name = 'Survived', x = ['No of male','No of Female'], y = survival_yes_cat),
        go.Bar(name = 'Did not Survive', x = ['No of male','No of Female'], y = survival_no_cat),
    ]
)
# create the layout object with barmode as group
fig.update_layout(barmode = 'group')
#plot the figure container
fig.show()

In [33]:
#store the number of passengers who survived separately for each class. 
survival_yes_cls = []
survival_yes_cls.append(df['Survived'].where((df['Survived']==1) & (df['Pclass']==1)).count())
survival_yes_cls.append(df['Survived'].where((df['Survived']==1) & (df['Pclass']==2)).count())
survival_yes_cls.append(df['Survived'].where((df['Survived']==1) & (df['Pclass']==3)).count())

#store the number of passengers who did not survive separately for each class.
survival_no_cls = []
survival_no_cls.append(df['Survived'].where((df['Survived']==0) & (df['Pclass']==1)).count())
survival_no_cls.append(df['Survived'].where((df['Survived']==0) & (df['Pclass']==2)).count())
survival_no_cls.append(df['Survived'].where((df['Survived']==0) & (df['Pclass']==3)).count())

#add both survival_no_cls and survival_yes_cls to get the total number of passengers in each class
survival_total_cls = [a + b for a, b in zip(survival_yes_cls, survival_no_cls)]

In [34]:
#create the figure container with data object. The trace is a bar chart
fig = go.Figure(data = [
    go.Bar(name = "Total", x = ['Class 1', 'Class 2','Class 3'], y = survival_total_cls),
    go.Bar(name = "Survived", x = ['Class 1', 'Class 2','Class 3'], y = survival_yes_cls),
    go.Bar(name = "Not Survived", x = ['Class 1', 'Class 2','Class 3'], y = survival_no_cls),
])

# create the layout object with barmode as stack
fig.update_layout(barmode = 'stack')

#plot the figure container
fig.show()

#### Assessment 

- You are provided with a dataset named ‘Public talks’. The dataset contains multiple details of public talks that are organised in various events. In the questions, you will focus on only a few features of the data set, which include the following:

    + Comments - Number of comments on the speakers’ talks on various social media platforms.
    + Duration - Duration of a talk delivered by a speaker. In the data set, the duration is expressed in seconds.
    + Main Speaker - Name of the speaker who delivered the talk.
    
#### Problem Statement: 
- Answer the following questions by creating a stacked bar chart in Plotly for the top three speakers who received the highest number of comments for their talk. Each element on the x-axis, which in your case are the top three speakers, will have a separate stacked bar chart. The elements of the stacked bar chart should represent the following information:

- The first layer of the stacked bar chart should show the total number of comments that were received for the speaker talk.

- The second layer of the stacked bar chart should show the duration of the talk. This element should display the duratime time of the speaker's talk when hovered over it.

##### Q1. What are the names of the top three speakers who received the highest number of comments? The order is first, second and third? 
##### Q2. what is the speech duration of 'Ken Robinson' as seen when hovered over his stacked bar chart?

In [35]:
# Answer
# Richard Dawkins, Ken Robinson, Sam Harris


# Read the data from file

dataframe = pd.read_csv("data/Public+talks.csv")

 

# Sort the data and get the top three speakers with the most number of comments

data_sorted=dataframe.sort_values(by='comments',ascending=False)

data_comments=data_sorted.iloc[:3,:]

 

# Create two trace in the figure container

fig = go.Figure( data = [go.Bar(x= data_comments.main_speaker, y= data_comments.comments, name='comments', type='bar', marker = dict( color='rgb(58,200,225)' )),

go.Bar(x= data_comments.main_speaker, y= data_comments.duration, name = 'duration in minutes', type='bar',text= data_comments.title, marker= dict(color='rgb(158,202,225)'))])

 

# Create the layout object

fig.update_layout(xaxis = {'title': 'Top 3 speakers'}, barmode='stack',title= 'Number of comments and speech duration of the 3 most commented')

 

# Visualize the figure container

fig.show()

# Line Charts

### Dataset Information
The Dataset is specifically about India and has the following fields:

date (Year of record)

GDP_Per_Capita 

Annual_Growth_Rate_percent (in percentage)

Population

In [44]:
# Importing Data Set into a Pandas Dataframe
df = pd.read_csv("data/EHL_GDP_Population_Data.csv")
df.head()

Unnamed: 0,date,GDP_Per_Capital,Annual_Growth_Rate_percent,Population
0,1960,82.1886,0.0,450547679
1,1961,85.3543,3.85,459642165
2,1962,89.8818,5.3,469077190
3,1963,101.1264,12.51,478825608
4,1964,115.5375,14.25,488848135


### Purpose
Observe the change in GDP_Per_capital w.r.t Population using scatter and line plots

### Scatter Plot

## Line Charts

In [45]:
fig = go.Figure()

#Adding traces
fig.add_trace(go.Scatter(x=df['Population'], y = df['GDP_Per_Capital'],
                        mode = 'markers',
                        name = 'markers'))

#Add the layout
fig.update_layout(title_text = 'India - Historical GDP vs population',
                 xaxis = dict(
                     title='Population'
                 ),
                 yaxis = dict(
                     title = 'GDP per capital'
                 ))
fig.show()


## Anomaly detection 

### NAB Dataset
The Numenta Anomaly Benchmark (NAB) is a novel benchmark for evaluating algorithms for anomaly detection in streaming, online applications and is available in Kaggle

In [40]:
# Importing Data Set into a Pandas Dataframe
df = pd.read_csv("data/NAB_Anomaly_Sample.csv")
df.head()

Unnamed: 0,timestamp,value
0,2015-07-10 14:24:00,564
1,2015-07-10 14:38:00,730
2,2015-07-10 14:48:00,770
3,2015-07-10 15:03:00,910
4,2015-07-10 15:22:00,1035


### Purpose
Using Line charts and markers to identify and plot anomalies in the given dataset

In [41]:
# Anomalous data points
df_anomalous_pts = df.loc[df.value > 2000]

In [42]:
fig = go.Figure()

# Add traces
fig.add_trace(go.Scatter(x=df_anomalous_pts['timestamp'], y=df_anomalous_pts['value'],
                    mode='markers',
                    name='Anomalies'))
fig.add_trace(go.Scatter(x=df['timestamp'], y=df['value'],
                    mode='lines',
                    name='lines'))
#Setting Layout
fig.update_layout(title_text='NAB Anomaly detection Dataset',
                  xaxis=dict(
                                title='Timestamp',
                                titlefont_size=16,
                                tickfont_size=14,
                            ),
                   yaxis=dict(
                                title='Value',
                                titlefont_size=16,
                                tickfont_size=14,
                            ),
                 )
fig.show()

##### Problem Statement: 
Answer the following questions by creating three separate line charts for the following three events: ‘TEDGlobal 2005’ 'TED2006' and TED2002’. Each line chart should show the number of views received by all the speakers who gave a talk in the respective events mentioned above. Following are the points to be noted:

- The line graphs should be of different colours for both the events.
- The graphs should have a legend for each event.
- The line charts should show the name of the speakers when the cursor is hovered over them.
##### What is the number of views for 'Dan Gilbert' who gave his speech in the event 'TEDGlobal 2005'? (Approx answer)

In [43]:
# 3.7 Million

# Read the data from file

dataframe = pd.read_csv("data/Public+talks.csv")

 

# Filter the dataframe to take only the necessary details
dfGlobal=dataframe[dataframe['event']=='TEDGlobal 2005']
df2002=dataframe[dataframe['event']=='TED2002']
df2006=dataframe[dataframe['event']=='TED2006']

 

# Add three traces to the figure container
fig = go.Figure(data = [  go.Scatter( x = dfGlobal.index,y = dfGlobal.views,mode = "lines+markers",name = "TEDGlobal 2005",marker = dict(color = 'rgba(255, 128, 255, 0.8)'),text= dfGlobal.main_speaker),        

go.Scatter( x = df2002.index,y = df2002.views, mode = "lines+markers", name = "TED2002", marker = dict(color = 'rgba(255, 128, 2, 0.8)'), text= df2002.main_speaker),  

go.Scatter( x = df2006.index, y = df2006.views, mode = "lines+markers", name = "TED2006", marker = dict(color = 'rgba(0, 128, 2, 0.8)'),text= df2006.main_speaker),     ] )

 

# Create the layout object

fig.update_layout(title='Number of views received at TEDGlobal 2005, TED2002, TED2006',xaxis= {'title':'index','ticklen':5,'zeroline' : False},yaxis= {'title':'Views','ticklen':5,'zeroline' : False})

# Visualise the figure container

fig.show()

##### Q2. Considering the problem statement made in the previous question.What are the names of the speakers who got the highest views in the events 'TEDGlobal 2005', 'TED2002' and 'TED2006' respectively?

In [47]:
# Barry Schwartz, Richard Dawkins, Ken Robinson


# Read the data from file

dataframe = pd.read_csv("data/Public+talks.csv")

 

# Filter the dataframe to take only the necessary details
dfGlobal=dataframe[dataframe['event']=='TEDGlobal 2005']
df2002=dataframe[dataframe['event']=='TED2002']
df2006=dataframe[dataframe['event']=='TED2006']

 

# Add three traces to the figure container
fig = go.Figure(     data = [  go.Scatter( x = dfGlobal.index,y = dfGlobal.views,mode = "lines+markers",name = "TEDGlobal 2005",marker = dict(color = 'rgba(255, 128, 255, 0.8)'),text= dfGlobal.main_speaker),        

go.Scatter( x = df2002.index,y = df2002.views, mode = "lines+markers", name = "TED2002", marker = dict(color = 'rgba(255, 128, 2, 0.8)'), text= df2002.main_speaker),  

go.Scatter( x = df2006.index, y = df2006.views, mode = "lines+markers", name = "TED2006", marker = dict(color = 'rgba(0, 128, 2, 0.8)'),text= df2006.main_speaker),     ] )

 

# Create the layout object

fig.update_layout(title='Number of views received at TEDGlobal 2005, TED2002, TED2006',xaxis= {'title':'index','ticklen':5,'zeroline' : False},yaxis= {'title':'Views','ticklen':5,'zeroline' : False})

 

# Visualise the figure container

fig.show()

# Bubble Plots

### Dataset Information
A collection of information about 227 countries containing Population , GDP, Area and many other features 

In [49]:
#Importing Libraries
import plotly
import plotly.graph_objects as go
import pandas as pd

In [51]:
# Importing Data Set into a Pandas Dataframe
df = pd.read_csv("data/world_gdp.csv")
df.head()

Unnamed: 0,Country,Region,Population,Area (sq. mi.),Pop. Density (per sq. mi.),Coastline (coast/area ratio),Net migration,Infant mortality (per 1000 births),GDP ($ per capita),Literacy (%),Phones (per 1000),Arable (%),Crops (%),Other (%),Climate,Birthrate,Deathrate,Agriculture,Industry,Service
0,Afghanistan,ASIA (EX. NEAR EAST),31056997,647500,480,0,2306,16307,700.0,360,32,1213,22,8765,1,466,2034,38.0,24.0,38.0
1,Albania,EASTERN EUROPE,3581655,28748,1246,126,-493,2152,4500.0,865,712,2109,442,7449,3,1511,522,232.0,188.0,579.0
2,Algeria,NORTHERN AFRICA,32930091,2381740,138,4,-39,31,6000.0,700,781,322,25,9653,1,1714,461,101.0,6.0,298.0
3,American Samoa,OCEANIA,57794,199,2904,5829,-2071,927,8000.0,970,2595,10,15,75,2,2246,327,,,
4,Andorra,WESTERN EUROPE,71201,468,1521,0,66,405,19000.0,1000,4972,222,0,9778,3,871,625,,,


### Purpose
To use the same plot for scatter plots with additional feature to control the size of the bubble

In [52]:
filter_df = df.loc[df.Population > 20000000]
count = [i for i in range(len(filter_df))]

In [53]:
#Add traces
fig = go.Figure()

# Add traces
fig.add_trace(go.Scatter(x=filter_df['GDP ($ per capita)'], y=filter_df['Population'],
                    mode='markers',
                    name='markers',
                    marker= dict(size= (filter_df['Area (sq. mi.)']/100000),
                         color = count)
                        ))

#create the layout object
fig.update_layout(title_text='World - GDP vs Polulation',
                xaxis = dict(title = 'Population'),
                yaxis = dict(title = 'GDP percapita'))

#plot the figure container
fig.show()

#### Assessment

You are provided with a dataset named ‘Planet’. The data set contains multiple details and features of planets. In the questions, you will focus on only a few features of the data set, which include the following:

- Planet 
- no_of_moons
- distance_from_sun
- planet_diameter

##### Problem Statement: 
Answer the following questions by creating a bubble plot in Plotly with the following terms and conditions.

Defining the Bubble Chart
- planets are on the X-axis
- the Y-axis denotes the distance from the sun (in Million KM)
- the marker size represents the planet diameter
- the colour represents the number of moons
- The hover text should show the planet diameter
- You can size ref the plots to 1000
##### Which planet has the highest number of moons? Give your answer by observing the values on the colour scale.

In [55]:
# Jupiter Has the highest number of moons with value 67. The following is the code:

 

# Read the data from file

data = pd.read_csv('data/planets.csv')

 

# Add a trace to the figure conatiner

fig = go.Figure(
    data = [
        go.Scatter(x = data['planet'],
                   y = data['distance_from_sun'],
                   mode = 'markers',
                   marker =dict(size = data['planet_diameter'],
                                sizeref = 1000,
                                color = data['no_of_moons'],
                                colorscale = 'Rainbow',
                                showscale = True
                              ),
                   text =  [str(dia) + ' km' for dia in data['planet_diameter']]),
          ]
)

 

# Create the layout object
fig.update_layout(height = 600, 
                   width = 900,
                   title = 'Planets of our Solar System')

 

# Plot the figure container
fig.show()

# Time Series and Candle Stick Charts

#### Time series charts

Time series charts are used to represent data points at successive intervals. The horizontal axis or the x-axis is used to plot dates or time intervals and the vertical axis or the y-axis is used to plot the values that are to be measured. Each data point in the chart corresponds to a measured quantity at a specific time unit. Time series analysis is extensively used for univariate and multivariate analysis of data over time.

 

#### Candlestick charts

A candlestick chart is a type of time series chart that is specifically meant for stock market analysis. Similar to a bar chart, a ‘candlestick’ shows the markets’ open, high, low and close price for the day.

 

In [56]:
# Importing Data Set into a Pandas Dataframe
df = pd.read_csv("data/Financial_Astrazeneca_shares.csv")
df.rename(columns = {'Open Price':'Open'}, inplace = True)
df.rename(columns = {'High Price':'High'}, inplace = True)
df.rename(columns = {'Close Price':'Close'}, inplace = True)
df.rename(columns = {'Low Price':'Low'}, inplace = True)
df.head()

Unnamed: 0,Date,Open,Close,High,Low,Volume
0,2015-10-09,4209.0,4209.0,4219.0,4133.5,991854
1,2015-10-12,4198.5,4180.0,4206.5,4154.0,558460
2,2015-10-13,4172.0,4141.0,4187.0,4116.5,910524
3,2015-10-14,4110.0,4061.0,4110.0,4052.5,817526
4,2015-10-15,4081.0,4117.0,4130.0,4073.0,867805


### Timeseries

In [57]:
#create the figure container with data object. The trace is a scatter chart
fig  = go.Figure(go.Scatter(
    x = list(df['Date']),
    y = list(df['Close'])
))

#plot the figure container
fig.show()

### Candlestick

In [58]:
#create the figure container with data object. The trace is a candlestick chart
fig = go.Figure(data = [ go.Candlestick(
    x = df['Date'],
    open  = df['Open'],
    high = df['High'],
    low = df['Low'],
    close = df['Close']
)])

# create the layout object with rangeslider value as false
fig.update_layout (title_text = "Stock market - Candle stick Charts",
                  xaxis = dict(
                      rangeslider = dict(
                          visible = False
                      )
                  ))

#plot the figure container
fig.show()

##### Problem Statement: 
Answer the following questions by creating a candlestick chart in Plotly using the graph_objects library for the “Google” dataset?

In the candlestick charts, you learnt that a red colour denotes a decline in the stock price as compared to the previous day closing price, while a green colour denotes an increase in the stock price as compared to the previous day closing price. 
##### Keeping this in mind did Google's closing stock price increase or decrease on October 8th 2010 as compared to October 7th 2010 closing price?

 

In [59]:
# Increase
# Read the data file 

df = pd.read_csv('data/GOOGLE.csv')

 

# Convert the time to conatin only the year
df['Year'] = pd.DatetimeIndex(df['Date']).year

 

# Filter the data to contain only the 2010 stock market price
df= df[df['Year']==2010]

 

# Add a trace to the figure container
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
                open=df['Open'],
                high=df['High'],
                low=df['Low'],
                close=df['Close'])])

 

# Add the layout object

fig.update_layout(title = 'Closing Stock price of Google in Year 2010',
              xaxis= dict(title= 'Date',ticklen= 5,rangeslider = dict(visible = False),zeroline= False),
              yaxis= dict(title= 'Closing Stock price($)',ticklen= 5,zeroline= False)
             )

 

# Plot the figure container
fig.show()



##### Problem Statement: 
Answer the following question by creating a candlestick chart in Plotly using the graph_objects library for the “Amazon” dataset?

##### When was stock price the highest for amazon(consider all the columns of the dataset High, Low, Close, Open) and what was its value?

In [61]:

# September 4th 2018, 2050.5

df = pd.read_csv('data/AMAZON.csv')
df['Year'] = pd.DatetimeIndex(df['Date']).year

fig = go.Figure(data=[go.Candlestick(x=df['Date'],
                open=df['Open'],
                high=df['High'],
                low=df['Low'],
                close=df['Close'])])

fig.update_layout(title = 'Closing Stock price of Google in Year 2010',
              xaxis= dict(title= 'Date',ticklen= 5,rangeslider = dict(visible = False),zeroline= False),
              yaxis= dict(title= 'Closing Stock price($)',ticklen= 5,zeroline= False)
             )
fig.show()

The values 'Street1', 'Street2', 'Street3', 'Front Lobby','Rear Lobby' and 'Conference Room' in the locations array are represented as 0,1 ,2, 3, 4, 5 respectively in the source and target array. The value array contains the number of people who moved from source to target. For example, lets take the first value of all the three arrays - source, target and value which are 0(Street1), 3(Front Lobby) and 2 respectively. It means that 2 people have moved from Street1 to the Front Lobby. Lets summarize all the parameters:

- Source: It is an array representing the starting points for each channel. 

- Target: It is an array representing the destination or end point of channel .

- Value: It gives the quantity of the flow of data between source and destination.

- label: an array consisting of names of nodes.

- color: color of the nodes in the graph.

- line: border color for all nodes in the graph

- thickness: represents the width of nodes