# DATA VISUALIZATION WITH PLOTLY
by Nitin Gupta, UofSC student, 2022
This notebook walks you through visualizing your data in various types of graphs by using the plotly and pandas libraries.

## Installing and importing plotly and pandas

First, we need to install the libraries in our system. This takes only a few lines of code to run.

In [None]:
import sys
!{sys.executable} -m pip install plotly
!{sys.executable} -m pip install pandas

After installing the libraries, we need to import them to work on this notebook...

In [None]:
import plotly.express as px
import pandas as pd

$~$

## Reading your file
To read the file that you want to visualize, you need the relative path of that file. Once you get that path, you just need to type that in the **MyfilePath** variable down below.

For example, if the file is saved in a sepearate folder (but in the same directory as this notebook) named "data", then your filePath variable will look like,
filePath = "data/sample.csv"

If the file is in the same folder, you can simply type the file name as below.

In [None]:
#filePath = "MyfilePath"; # insert your filepath
filePath = "cereal.csv"; # insert your filepath

And, then run this line to store the data from that file in a panda dataframe.

In [None]:
df = pd.read_csv(filePath)

<br>

## Printing out the data

By default, the print(df) commmand prints out only the first and last five of the various rows and columns from your file. The three dots in the middle of the data frame are where the program is truncating the data.

In [None]:
print(df)

However, to display all (or some) of the rows and/or columns, you need to run either or both of the following commands.
<br> **CAUTION**: Your dataset might have many thousand or millions of rows/columns, so displaying all of them might slow your browser down.
<br>
<br> In these commands, use 'None' as the 2nd argument to show all of the columns or rows (default option), or set 'None' to an integer to show that specific number of columns or rows

In [None]:
# for changing the columns...
pd.set_option('display.max_columns', None)
#Could also do this - list below the columns you want to drop or not see
#df = df.drop(['outputFormat', 'pageEnd', 'pageStart', 'datePublished'], axis=1)

# for changing the rows...
pd.set_option('display.max_rows', 10)

Then, run print(df) again to show those number of rows and columns

In [None]:
print(df)

<br>
<br>

### Instructions
<br> For any of the following graphs, you need to define which columns from your data goes in the x and y axis.
To do this, put those column names (inside the quotation marks) in the xAxisName and yAxisName variables.
Similarily, to add a title, change the title variable (inside quotations) to be your desired title.

<br>

#### For importing images as files...

1. For interactive HTML files: 
<br>Below each cell for creating the image is a cell for exporting that image to a .html file. Just run that cell to export the image.
<br>In the cell, change "path/to/file" to your destination and the file name you want BEFORE the ".html".
<br>For example, "data/sampleFile.html" gets saved under the same directory but within the data folder and has the name "sampleFile.html".
<br>

2. For .png files:
<br>The easiest way to import the image as .png is to create the image and then hover over the image. Once you do that, there should appear around eight various options on the top right of the image cell. When you click over the camera icon (the leftmost option), you can save the image as a .png file locally.

**Note about choosing colors**
<br>With the following graphs you can pass in a column name from you dataset as the color argument.

<br> Plotly will automatically assigns data values to continuous color if the column is numeric. 
<br> However, if the column contains strings, the color will automatically be considered discrete (also known as categorical or qualitative).

<br>

When defining other variables for that specific graph, you could also specify the color argument, which looks this...
```Python 
color = "<insert column name here>" 
```

If your data column contains numeric values, you can also convert those numerics to strings...
```Python
df["<column name>"] = df["<column name>"].astype(str)
```

Similarily, you can also convert from strings to numerics...
```Python
df["<column name>"] = df["<column name>"].astype(float)
```
<br>

*In these commands, you insert the column name in place of: ' `<insert column name here>` '*

<br>

**NOTE:**
<br>
**To use any of the above 3 commands, just uncomment them within that specific graph cell
<br>
To uncomment, remove the pound ('#') symbol before those lines
<br> <br>
If you do end up using those commands, you need to replace**
```Python
color = None
``` 
**with** 
```Python
color = color
```
**in the parameter (within the parenthesis) of the call to graph that function (the call is usually the line preceding `figure.show()` command)**

<br><br>

## Line graph
Select a column from your data and type it in for xAxisName variable and another column for the yAxisName variable.

In [None]:
#title = "YOUR TITLE HERE"
#xAxisName = "YOUR COLUMN NAME Here"
#yAxisName = "YOUR COLUMN NAME Here"

#See below as an example with the cereal csv
#Make sure to comment out below or edit with your title and axese.

title = "Cereal Knowledge"
xAxisName = "name"
yAxisName = "calories"

# Optional commands...

#df["Entity"] = df["Entity"].astype(str)
#df["YOUR Column Name Here"] = df["Same Column Name again"].astype(str)

#color = "" 

After setting the various variables and runnning the cell above, just run the cell below to draw the line graph.

In [None]:
lineFigure = px.line(df, xAxisName, yAxisName, title = title)
lineFigure.show()

In [None]:
# Run and change this cell if you want to import the image as an HTML file.
lineFigure.write_html("/boydkf/cereal.html")

<br>
<br>

## Bar Graph

In [20]:
xAxisName = "name"
yAxisName = "calories"
title = "Cereal Knowledge"

orientation = "v" ## Enter either "v" for a verticle bar graph or "h" for a horizontal one.


# Optional commands...

# df["<insert column name here>"] = df["<insert column name here>"].astype(str)
# df["<insert column name here>"] = df["<insert column name here>"].astype(float)

# color = "<insert column name here>" 

barFigure = px.bar(df, x = xAxisName, y = yAxisName, title = title, orientation = orientation, color = None)
barFigure.show()

In [None]:
# Run and change this cell if you want to import the image as an HTML file.
barFigure.write_html("path/to/file.html")

<br>
<br>

## Scatter Plot

In [21]:
xAxisName = "name"
yAxisName = "calories"
title = "Cereals"


# Optional commands...

# df["<insert column name here>"] = df["<insert column name here>"].astype(str)
# df["<insert column name here>"] = df["<insert column name here>"].astype(float)

#color = "color" 

scatterFigure = px.scatter(df, xAxisName, yAxisName, title = title)
scatterFigure.show()

In [None]:
# Run and change this cell if you want to import the image as an HTML file.
scatterFigure.write_html("path/to/file.html")

<br><br>

## Pie Chart

In [None]:
xAxisName = "3.5"
yAxisName = "5.1"
title = ""


# Optional commands...

# df["<insert column name here>"] = df["<insert column name here>"].astype(str)
# df["<insert column name here>"] = df["<insert column name here>"].astype(float)

# color = "<insert column name here>" 
 

pieFigure = px.pie(df, xAxisName, yAxisName, title=title, color = None)
pieFigure.show()

In [None]:
# Run and change this cell if you want to import the image as an HTML file.
pieFigure.write_html("path/to/file.html")