### Prepping Data Challenge: PD X WOW Salesforce Opportunites (Week 23)
 
### Requirements
- Input the data
- For the Opportunity table:
  - Pivot the CreatedDate & CloseDate fields so that we have a row for when each opportunity Opened and a row for the ExpectedCloseDate of each opportunity 
    - Rename the Pivot 1 Values field to Date
    - Rename the Pivot 1 Names field to Stage and update the values
  - Update the Stage field so that if the opportunity has closed (see the StageName field) the ExpectedCloseDate is updated with the StageName 
  - Remove unnecessary fields
    - Hint: look at the fields in common with the Opportunity History table
  - Bring in the additional information from the Opportunity History table about when each opportunity moved between each stage 
  - Ensure each row has a SortOrder associated with it 
    - Opened rows should have a SortOrder of 0
    - ExpectedCloseDate rows should have a SortOrder of 11
- Remove unnecessary fields
- Remove duplicate rows that may have occurred when brining together the two tables 
- Output the data 

In [1]:
import pandas as pd

In [2]:
#input the data
df1 = pd.read_csv('wk23-Opportunity History.csv', parse_dates = ['CreatedDate'], dayfirst=True)
df2 = pd.read_csv('wk23-Opportunity.csv', parse_dates = ['CreatedDate','CloseDate'], dayfirst=True)

In [3]:
df1.head()

Unnamed: 0,OppID,CreatedDate,StageName,SortOrder
0,0068d000002OrbYAAS,2022-04-06,Negotiation/Review,8
1,0068d000002OrbYAAS,2022-07-02,Prospecting,1
2,0068d000002OrbYAAS,2022-07-07,Closed Won,9
3,0068d000002OrbJAAS,2022-03-13,Qualification,2
4,0068d000002OrbJAAS,2022-07-13,Value Proposition,4


In [4]:
df1.rename(columns = {'CreatedDate' : 'Date', 'StageName':'Stage'}, inplace=True)

In [5]:
df2.head()

Unnamed: 0,CreatedDate,Id,IsDeleted,AccountId,IsPrivate,Name,Amount,Probability,ExpectedRevenue,TotalOpportunityQuantity,...,HasOpenActivity,HasOverdueTask,LastAmountChangedHistoryId,LastCloseDateChangedHistoryId,DeliveryInstallationStatus__c,TrackingNumber__c,OrderNumber__c,CurrentGenerators__c,MainCompetitors__c,StageName
0,2022-03-14,0068d000002OrZqAAK,False,0018d000007K0NXAA0,False,seize dot-com e-services,454160.0,50,227080.0,706.0,...,False,False,0088d000009OWe8AAG,,,,,,,Closed Lost
1,2022-05-01,0068d000002OrZrAAK,False,0018d000007K0PdAAK,False,extend virtual experiences,493470.0,10,49347.0,737.0,...,False,False,0088d000009OWdAAAW,,,,,,,Closed Won
2,2022-03-03,0068d000002OrZsAAK,False,0018d000007K0OaAAK,False,revolutionize world-class platforms,625850.0,10,62585.0,699.0,...,False,False,0088d000009OWdBAAW,,,,,,,Negotiation/Review
3,2022-03-15,0068d000002OrZtAAK,False,0018d000007K0OcAAK,False,utilize end-to-end vortals,826630.0,60,495978.0,1080.0,...,False,False,0088d000009OWauAAG,,,,,,,Closed Lost
4,2022-03-01,0068d000002OrZuAAK,False,0018d000007K0OVAA0,False,disintermediate extensible platforms,679000.0,20,135800.0,908.0,...,False,False,0088d000009OWdCAAW,,,,,,,Qualification


In [6]:
#For the Opportunity table:
#- Pivot the CreatedDate & CloseDate fields so that we have a row for when each opportunity Opened and 
#a row for the ExpectedCloseDate of each opportunity 
df3 = df2[['Id','CreatedDate']].assign(Stage='Opened',SortOrder=0)

In [7]:
df3.head()

Unnamed: 0,Id,CreatedDate,Stage,SortOrder
0,0068d000002OrZqAAK,2022-03-14,Opened,0
1,0068d000002OrZrAAK,2022-05-01,Opened,0
2,0068d000002OrZsAAK,2022-03-03,Opened,0
3,0068d000002OrZtAAK,2022-03-15,Opened,0
4,0068d000002OrZuAAK,2022-03-01,Opened,0


In [8]:
df3.rename(columns = {'Id':'OppID','CreatedDate' : 'Date'}, inplace=True)

In [9]:
#Update the Stage field so that if the opportunity has closed (see the StageName field) the ExpectedCloseDate 
#is updated with the StageName
#ExpectedCloseDate rows should have a SortOrder of 11
df4 = df2[~df2['StageName'].str.contains('Closed')][['Id', 'CloseDate']].assign(Stage='ExpectedCloseDate', SortOrder=11)

In [10]:
df4.head()

Unnamed: 0,Id,CloseDate,Stage,SortOrder
2,0068d000002OrZsAAK,2022-12-06,ExpectedCloseDate,11
4,0068d000002OrZuAAK,2022-08-20,ExpectedCloseDate,11
5,0068d000002OrZvAAK,2022-03-18,ExpectedCloseDate,11
7,0068d000002OrZxAAK,2022-09-25,ExpectedCloseDate,11
8,0068d000002OrZyAAK,2022-09-18,ExpectedCloseDate,11


In [11]:
df4.rename(columns = {'Id':'OppID','CloseDate' : 'Date'}, inplace=True)

In [12]:
#Bring in the additional information from the Opportunity History table about when each opportunity moved between each stage
output = pd.concat([df1,df3,df4], ignore_index=True)

In [13]:
output.head(10)

Unnamed: 0,OppID,Date,Stage,SortOrder
0,0068d000002OrbYAAS,2022-04-06,Negotiation/Review,8
1,0068d000002OrbYAAS,2022-07-02,Prospecting,1
2,0068d000002OrbYAAS,2022-07-07,Closed Won,9
3,0068d000002OrbJAAS,2022-03-13,Qualification,2
4,0068d000002OrbJAAS,2022-07-13,Value Proposition,4
5,0068d000002OrbJAAS,2022-07-18,Closed Lost,10
6,0068d000002OrbZAAS,2022-03-15,Prospecting,1
7,0068d000002OrbZAAS,2022-04-16,Prospecting,1
8,0068d000002OrbZAAS,2022-05-02,Prospecting,1
9,0068d000002OrbZAAS,2022-05-10,Value Proposition,4


In [14]:
#output the data 
output.to_excel('wk23-output.xlsx', index=False)