# Company Inc - Sales Data Analysis

## Relationship between Ship Mode and Time to Ship

<br>

### This notebook was developed to study the relationship between Ship Mode and Time to Ship. 
### -> Time to Ship is the difference between Order Data and Ship Date in number of days.
### -> Ship Mode is divided in 4 classes: "Same Day", "First Class", "Second Class" and "Standard Class".

### Import the libraries

In [0]:
# plotly is a python library for visualizations
import plotly.express as px

### Get the data using a SQL query to combine the fact table "Order" and the two dimensions tables "Ship" and "Ship_Mode"

In [0]:
#get the data using Spark SQL to combine the fact table "Order" and the two dimensions tables "Ship" and "Ship_Mode"
df = spark.sql("""

SELECT (sh.Ship_Date - ord.Order_Date) AS Time_to_Ship -- Column with the difference between Order Data and Ship Date in number of days
        ,sp.Ship_Mode -- Column with the Ship Mode divided in 4 classes: "Same Day", "First Class", "Second Class" and "Standard Class".
        ,CASE -- Column with "Case Mode" to transform Ship Mode Column in a numerical column
            WHEN Ship_Mode = 'Same Day' THEN 0
            WHEN Ship_Mode = 'First Class' THEN 1
            WHEN Ship_Mode = 'Second Class' THEN 2
            WHEN Ship_Mode = 'Standard Class' THEN 3
            ELSE -1
        END AS Ship_Mode_nb
FROM company.ship_mode sp -- Dimension table: Ship Mode
     ,company.ship sh -- Dimension table: Ship
     ,company.order ord -- Fact table: Order
WHERE sp.Ship_Mode_ID = sh.Ship_Mode_ID -- Combine with a Inner Join the tables "Ship_Mode" and "Ship" using the model keys
      AND ord.Ship_ID = sh.Ship_ID -- Combine with a Inner Join the tables "Order" and "Ship" using the model keys

""")

### Transform the data from a Spark Dataframe to a Pandas Dataframe. It will be use the Pandas Dataframe with Plotly library and the correlation calculcation

In [0]:
df_pd = df.toPandas()

### Box Plot using plotly to show the results between Time to Ship and Ship Mode

In [0]:
fig = px.box(df_pd,  x='Time_to_Ship', color='Ship_Mode')
fig.show()

### Histogram and Box Plot, using plotly, to show the results between Time to Ship and Ship Mode

In [0]:
fig = px.histogram(df_pd,  x='Time_to_Ship', color='Ship_Mode', marginal='box')
fig.show()

### Box Plot using plotly to show the results between Time to Ship and Ship Mode in a numerical conversion

In [0]:
fig = px.box(df_pd,  x='Time_to_Ship', y='Ship_Mode_nb', color='Ship_Mode')
fig.show()

### Calculate the correlation of Pearson, Spearman and Kendall

In [0]:
# Pearson Correlation
pearson_corr = df_pd[['Time_to_Ship', 'Ship_Mode_nb']].corr(method='pearson')

# Spearman Correlation
spearman_corr = df_pd[['Time_to_Ship', 'Ship_Mode_nb']].corr(method='spearman')

# Kendall Correlation
kendall_corr = df_pd[['Time_to_Ship', 'Ship_Mode_nb']].corr(method='kendall')

# Exibir os resultados
print("Pearson Correlation:")
print(pearson_corr)

print("\nSpearman Correlation:")
print(spearman_corr)

print("\nKendall Correlation:")
print(kendall_corr)

Pearson Correlation:
              Time_to_Ship  Ship_Mode_nb
Time_to_Ship      1.000000      0.817761
Ship_Mode_nb      0.817761      1.000000

Spearman Correlation:
              Time_to_Ship  Ship_Mode_nb
Time_to_Ship      1.000000      0.771999
Ship_Mode_nb      0.771999      1.000000

Kendall Correlation:
              Time_to_Ship  Ship_Mode_nb
Time_to_Ship      1.000000      0.680643
Ship_Mode_nb      0.680643      1.000000
