## <b> $ \color{blue}{\text {Welcome!: Introduction to Python for Data Visualization with Jupyter Notebooks!} } $ </b>
***
#### $\color{purple}{\text{Course details: 34:816:656 ||||||| Rutgers University ||||||| E. J. Bloustein School}}$
***
+ Master of Public Informatics Program: https://bloustein.rutgers.edu/graduate/public-informatics/mpi/
+ Informatics graduate certificate: https://bloustein.rutgers.edu/graduate/public-informatics/mpi/certificate/
+ Join us on LinkedIn for the latest updates on MPI, informatics, DS and AI: https://www.linkedin.com/company/rutgers-masters-in-public-informatics/
+ PISG Student group for informatics / analytics / AI: https://bloustein.rutgers.edu/students/organizations/pisg/
***


+ <b> Industry updates [@ Jim Samuel](https://twitter.com/jimsamuel/) ---- <b> https://twitter.com/jimsamuel/ </b>
***


#### Notes & More resources
+ This notebook is collection of foundational instructions from multiple unlisted sources
+ If you click on "Help" in the toolbar, there is a list of references for common Python tools, e.g. numpy, pandas.
+ [IPython website](https://ipython.org/) |||||||
+ [Markdown basics](https://daringfireball.net/projects/markdown/) |||||||
+ [Jupyter Notebook Documentation](https://jupyter-notebook.readthedocs.io/en/stable/index.html) |||||||
+ [Real Python Jupyter Tutorial](https://realpython.com/jupyter-notebook-introduction/) |||||||
+ [Dataquest Jupyter Notebook Tutorial](https://www.dataquest.io/blog/jupyter-notebook-tutorial/) |||||||
+ [Stack Overflow](https://stackoverflow.com/) |||||||

<div class="alert alert-info">
    
We support and continue to contribute to open-source code and resources BUT the contents of this for-credit and graded course are protected for ethical reasons and course integrity. Beyond use within this course, none of the course materials developed for this course may be copied, reproduced, re-published, uploaded, posted, transmitted, or distributed in any way without written authorization from the concerned faculty /authors.    
### Copyright Statement: All rights reserved.
* The contents (Jupyter notebooks, Assignments, Exams and other developed materials) presented in this course have been carefully prepared to benefit students enrolled in this course.
* <b> Therefore, for the benefit of future students and course integrity,  PLEASE DO NOT SHARE OR DISSEMINATE </b> any of these materials outside of this class so that the learning experience of future students remains unique and valuable.  
    * <b> Please do not post these materials to GitHub or to any other platform or website. </b>
    * When / If using Google Colab, pls. ensure that the file is not set up for public access (default expected setting is private).
    
</div>

***

<div class="alert alert-info">
    
### Notebook 5: Intermediate Data Visualization - 2

<ul type="1">
<li> In this Module, we will study numerous plots critical for informatics, data science and analytics professionals: violins, ridgeline and time series. </li>
</ul>
</div>

In [2]:
# Uncomment and run below if you need to install plotly
# ! pip install plotly


### Violin Plot

+ Violin Plot can be used to visualize the distribution of numerial data. Violinplots reveal more nuances about the shape of the data compared to boxplots that can sometimes hide features of the data.
+ Details about the dataset are available [here](https://vincentarelbundock.github.io/Rdatasets/doc/reshape2/tips.html)

In [None]:
import plotly.express as px

# We will use the tips datasets that has data on tips in dollors
df = px.data.tips()
fig = px.violin(df, y="total_bill")
fig.update_layout(title_text='Violin Plot', title_x=0.5)
fig.show()

Violin plot with box and data points.

In [None]:
fig = px.violin(df, y="total_bill", box=True, # draw box plot inside the violin
                points='all', # can be 'outliers', or False
               )
fig.update_layout(title_text='Violin Plot with Data Points', title_x=0.5)
fig.show()

Multiple Violin Plot

In [None]:
fig = px.violin(df, y="tip", x="smoker", color="sex", box=True, points="all",
          hover_data=df.columns)
fig.update_layout(title_text='Multiple Violin Plot: Male-Female by Smoker-Nonsmoker', title_x=0.5)
fig.show()

### Ridgeline Plot

Ridgeline Plot shows the distribution of numerical variable for serveral groups. They can be used for visualizing changes in distributions over time or space.

In [None]:
import plotly.graph_objects as go
from plotly.colors import n_colors
import numpy as np
np.random.seed(1)

# 12 sets of normal distributed random data, with increasing mean and standard deviation
data = (np.linspace(1, 2, 12)[:, np.newaxis] * np.random.randn(12, 200) +
            (np.arange(12) + 2 * np.random.random(12))[:, np.newaxis])

colors = n_colors('rgb(5, 200, 200)', 'rgb(200, 10, 10)', 12, colortype='rgb')

fig = go.Figure()
for data_line, color in zip(data, colors):
    fig.add_trace(go.Violin(x=data_line, line_color=color))

fig.update_traces(orientation='h', side='positive', width=3, points=False)
fig.update_layout(xaxis_showgrid=False, xaxis_zeroline=False)
fig.update_layout(title_text='Ridgeline Plot', title_x=0.5)
fig.show()

### Basic Time Series Plot

Time series plot are used to show/represent the changes in data over time. Let's see an example of time-series plot showing percentage change in Intel share price over time.

In [None]:
import plotly.express as px
df = px.data.stocks()
df.head()

Unnamed: 0,date,GOOG,AAPL,AMZN,FB,NFLX,MSFT
0,2018-01-01,1.0,1.0,1.0,1.0,1.0,1.0
1,2018-01-08,1.018172,1.011943,1.061881,0.959968,1.053526,1.015988
2,2018-01-15,1.032008,1.019771,1.05324,0.970243,1.04986,1.020524
3,2018-01-22,1.066783,0.980057,1.140676,1.016858,1.307681,1.066561
4,2018-01-29,1.008773,0.917143,1.163374,1.018357,1.273537,1.040708


In [None]:
import plotly.express as px

df = px.data.stocks()

fig = px.line(df, x='date', y="MSFT")
fig.update_layout(title_text='Time-Series Plot', title_x=0.5)
fig.show()

***
#### Optional visualization below
***

In [None]:
# ! pip install yfinance       # leave this viz. out if the installs are not working, try it if you are curious.

In [None]:
import yfinance as yf

# Initializing the Ticker object
INTC_ticker = yf.Ticker('INTC')

# Extracting the history
INTC_df = INTC_ticker.history(start='2021-07-07')

In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

df = px.data.stocks()

# Next, this is where you create the subplot with a second y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])


# Create line-plot using closing prices and bar-plot for volumes
fig.add_trace(go.Scatter(x=INTC_df.index, y=INTC_df['Close'], name='Intel Closing Price'), secondary_y=False)
fig.add_trace(go.Bar(x=INTC_df.index, y=INTC_df['Volume'], name='Intel Volume', opacity=0.5), secondary_y=True)

# Updating layout
fig.update_layout(
    title='Quick View of Closing Prices for Intel - from 7/7/21 to date',
    xaxis_title='Date',
    yaxis_title='Price',
    xaxis_rangeslider_visible=True,
    hovermode='x'
)

# Disabling grid of second y-axis
fig.layout.yaxis2.showgrid=False

# Showing figure
fig.show()

In [None]:
# change the template for a different style:

In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

df = px.data.stocks()

# Next, this is where you create the subplot with a second y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])


# Create line-plot using closing prices and bar-plot for volumes
fig.add_trace(go.Scatter(x=INTC_df.index, y=INTC_df['Close'], name='Intel Closing Price'), secondary_y=False)
fig.add_trace(go.Bar(x=INTC_df.index, y=INTC_df['Volume'], name='Intel Volume', opacity=0.5), secondary_y=True)

# Updating layout
fig.update_layout(
    title='Quick View of Closing Prices for Intel - from 7/7/21 to date',
    xaxis_title='Date',
    yaxis_title='Price',
    template='plotly_dark',
    xaxis_rangeslider_visible=True,
    hovermode='x'
)

# Disabling grid of second y-axis
fig.layout.yaxis2.showgrid=False

# Showing figure
fig.show()