# Chapter 3

This notebook will automate the creation of an analysis plan for both the Space of Organisation Analysis and Space of Project Analysis sections of Chapter 3.

The prompts are parameterised, allowing users to generate a new analysis plan for each section by providing a new metric.

A Project Report word document will also be created using Python docx.

## Initial setup

In [1]:
import openai
import requests
import json
import os
import re
from docx import Document

In [2]:
OPENAI_API_KEY = "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"
WEBPILOT_API_KEY = "d8684e9c2d7246748111bf8da0e6cd52"

## Please provide metrics and industry to be investigated:

In [3]:
organisation_metric = "Time spent on sourcing stock per unit of stock"
project_metric = "The difference or percent change in summer sales revenue for musical instrument companies participating versus not attending summer music events and competitions"
industry = "vintage banjo"

## Functions to prompt chatGPT and parse the response:

In [4]:
def gpt_standard_query(prompt, context=None):
    openai.api_key = OPENAI_API_KEY
    messages=[
        {"role": "system", "content": "You are a skilled data analyst with experience in collecting, processing, and interpreting complex datasets. Your goal is to assist the user in devising data collection and analysis strategies."},
        {"role": "user", "content": context if context else ""},
        {"role": "user", "content": prompt}
    ]

    query_response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=messages,
        temperature=0.8
    )
    
    return query_response.choices[0].message['content']

In [5]:
def parse_into_points(text):
    # Strip leading and trailing whitespace
    text = text.strip()
    # Parse the JSON string into a list of Python dictionaries
    points = json.loads(text)
    return points

## Set up the project report document

In [6]:
document = Document()
document.add_heading("3. Create an Analysis Plan")
document.add_heading("3.1. Plan for Data Collection and Analysis ", 2)
document.add_heading("3.1.1. Space of Organisation Analysis", 3)
document.add_heading("3.1.1.1. The Role to Which the Recommendation Would Apply", 4)

<docx.text.paragraph.Paragraph at 0x285ed2f1d50>

## Create the Space of Organisation analysis plan

Chatgpt is first prompted to create a high level analysis plan.

The output is required in JSON format.

The program then loops through each step of the plan and prompts chatGPT to provide a detailed walkthrough of the execution of the step.

In [8]:
prompt = f"""You are an expert data analyst creating an alaysis plan for a company you're helping
            in the {industry} industry.
            For this analysis plan, you're investigating this metric: {organisation_metric}.
            In your opinion, who would be the best person in the company to present this analysis plan to
            in order to have the greatest impact at the company?
            Define their role in the company."""

In [9]:
role1 = gpt_standard_query(prompt)

In [10]:
print(role1)

In order to have the greatest impact at the company, the best person to present the analysis plan to would be the Chief Operating Officer (COO) or the Director of Operations. 

The COO or Director of Operations is responsible for overseeing the overall operations of the company, including inventory management, procurement, and supply chain management. They have the authority and decision-making power to implement changes and allocate resources within the company. 

Presenting the analysis plan to the COO or Director of Operations will help ensure that the insights and recommendations are considered at a strategic level. They can assess the operational impact of the analysis, allocate resources accordingly, and make decisions that will optimize the time spent on sourcing stock per unit of stock.


In [11]:
document.add_paragraph(role1)

<docx.text.paragraph.Paragraph at 0x285ef81f9d0>

In [12]:
document.add_heading("3.1.1.2. The Metric of Value", 4)
document.add_paragraph(organisation_metric)

<docx.text.paragraph.Paragraph at 0x285f088a530>

In [13]:
organization_prompt = f"""You are an expert data analyst
                who excels at identifying successes and issues for businesses within an industry, 
                and using that information to 
                provide recommendations to decision makers which have a high impact.
                
                It is your job to come up with a detailed 
                data collection and analysis plan for the following metric: '{organisation_metric}'.
                
                The end goal will be to guide attention toward the differences 
                between example organizations, propose causes for these differences, 
                and ideally validate these causes with additional data. 
                The analysis here should be driven by publicly available data.
                
                Start with a high level description of the data collection and analysis plan 
                in a step by step format.
                Then progressively go into more detail, describing fully each of the steps
                involved in data collection and data analysis.
                
                Please list the steps as a JSON object in the format 
                [{{'Step': n, 'Description': '...', 'Explanation': '...'}}, {{ next step..}}]  where n is the step number 
                and'...'` is the description and explanation of the step.
                """

In [14]:
high_level_description = gpt_standard_query(organization_prompt)

In [15]:
print(high_level_description)

[
  {
    "Step": 1,
    "Description": "Define the scope and objective of the analysis",
    "Explanation": "Clearly define what you want to achieve with the analysis and what specific aspects of 'Time spent on sourcing stock per unit of stock' you want to understand. This will help guide the entire data collection and analysis process."
  },
  {
    "Step": 2,
    "Description": "Identify the sample organizations for analysis",
    "Explanation": "Identify a diverse set of example organizations within the industry that will serve as the sample for analysis. These organizations should represent different sizes, locations, and business models to provide a comprehensive understanding of the metric."
  },
  {
    "Step": 3,
    "Description": "Collect publicly available data",
    "Explanation": "Gather publicly available data for the example organizations that may be relevant to the 'Time spent on sourcing stock per unit of stock' metric. This can include financial reports, annual state

In [16]:
# Parse the description into separate points
points = parse_into_points(high_level_description)

In [17]:
# Print the high level description to project report document
document.add_heading("3.1.1.3. Proposed Analysis Process", 4)
document.add_heading("3.1.1.3.1. High-level Description of Analysis Plan", 4)

for point in points:

    document.add_paragraph("Step {}: {}".format(point["Step"], point["Description"]))

    document.add_paragraph(point["Explanation"])

    #doc.add_paragraph() # add space between steps

In [18]:
# Print the detailed walkthrough to project report document
document.add_heading("3.1.1.3.2. Detailed walkthrough of each step", 4)

<docx.text.paragraph.Paragraph at 0x285ef806fb0>

In [19]:
# Iterate over each point and ask for more detail
for point in points:
    step = point['Step']
    description = point['Description']
    explanation = point['Explanation']
    
    detail_prompt = f"""In step {step} you proposed: '{description} - {explanation}'.

                      Based on this, please carry out this step using publicly available data sources.

                      Provide a detailed walkthrough of executing this step, including:

                      - The specific data sources and tools you would use

                      - Any code snippets to collect, process, or analyze the relevant data

                      - The key outputs, metrics, visualizations, or insights obtained from this step

                      Please provide enough detail so another analyst could reproduce your work based solely on this response.

                      Remember to include inline code snippets and output examples where applicable."""
    
    
    detailed_description = gpt_standard_query(detail_prompt, context=high_level_description)
    
    document.add_heading("Step "+str(step), 4)
    document.add_paragraph(detailed_description+"\n")
    
    print(detailed_description)

To define the scope and objective of the analysis for the "Time spent on sourcing stock per unit of stock" metric using publicly available data sources, we will follow these steps:

1. Clearly define the objective: Determine the specific objective of the analysis. For example, it could be to identify the average time spent on sourcing stock per unit of stock for different organizations within a specific industry.

2. Identify the data sources: Look for publicly available data sources that provide information relevant to the objective. Some potential data sources could include:

   - Financial reports: Check if the sample organizations' financial reports include information on inventory management, sourcing processes, or any related metrics.
   - Annual statements: Look for data on procurement or supply chain efficiency mentioned in annual statements.
   - Market research reports: Explore industry-specific reports that might provide insights into average sourcing times or efficiency ben

## Create the Space of Project analysis plan

The same steps are followed as in the previous analysis plan, but the prompts have been altered to reflect the different type of analysis required in this section.

In [20]:
document.add_heading("3.1.2. Space of Project Analysis", 3)
document.add_heading("3.1.2.1. The Role to Which the Recommendation Would Apply", 4)

<docx.text.paragraph.Paragraph at 0x285f084e410>

In [21]:
prompt = f"""You are an expert data analyst creating an alaysis plan for a company you're helping
            in the {industry} industry.
            For this analysis plan, you're investigating this metric: {project_metric}.
            In your opinion, who would be the best person in the company to present this analysis plan to
            in order to have the greatest impact at the company?
            Define their role in the company."""

In [22]:
role2 = gpt_standard_query(prompt)

In [23]:
print(role2)

In order to have the greatest impact at the company, the best person to present this analysis plan to would likely be the Chief Marketing Officer (CMO) or the Marketing Director. Their role in the company is to oversee and manage all marketing activities and strategies.

As the CMO or Marketing Director, they have a deep understanding of the company's marketing goals, strategies, and target audience. They are responsible for making decisions related to advertising, promotions, and events. Therefore, they would be the most interested in understanding the impact of participating in summer music events and competitions on the company's sales revenue.

By presenting the analysis plan to the CMO or Marketing Director, you can effectively communicate the potential benefits and drawbacks of participating in summer music events and competitions. They can then use this information to make informed decisions about allocating marketing resources, setting sales targets, and developing effective st

In [24]:
document.add_paragraph(role2)

<docx.text.paragraph.Paragraph at 0x285f08067d0>

In [25]:
document.add_heading("3.1.2.2. The Metric of Value", 4)
document.add_paragraph(project_metric)

<docx.text.paragraph.Paragraph at 0x285ed2dd4b0>

In [26]:
project_prompt = f"""You are an expert data analyst skilled at gathering detailed information to uncover insights about a metric. 

                It is your job to come up with a detailed data collection and analysis plan focused on the metric: {project_metric}.

                The goal is to guide attention toward differences in {project_metric} values and propose causes for those differences.   

                This analysis requires manually gathering rich metadata about a small set of examples.

                Start with a high level description of how you would collect detailed data related to {project_metric} and analyze it in a step-by-step format.

                Then provide more details on each step involved in data collection and analysis.

                List the steps as a JSON object with keys for Step, Description, and Explanation fields. 

                The analysis should aim to uncover insights from in-depth case studies rather than starting with a large dataset. 

                Focus on manually gathering as much relevant metadata as possible about each example.

                Use this data to explore what factors may cause differences in {project_metric} between examples. 

                Validate ideas by checking consistency with any other available data.

                Provide enough detail so another analyst could reproduce your approach. 
                
                Please list the steps as a JSON object in the format 
                [{{'Step': n, 'Description': '...', 'Explanation': '...'}}, {{ next step..}}]  where n is the step number 
                and'...'` is the description and explanation of the step.
                """

In [27]:
high_level_description2 = gpt_standard_query(project_prompt)

In [28]:
print(high_level_description2)

[
  {
    "Step": 1,
    "Description": "Identify a sample of musical instrument companies that have participated in summer music events and competitions and a sample of companies that have not participated.",
    "Explanation": "To compare the difference or percent change in summer sales revenue, we need to select similar companies that have participated in summer music events and competitions and companies that have not participated. This will help us isolate the impact of participation on sales revenue."
  },
  {
    "Step": 2,
    "Description": "Collect summer sales revenue data for each selected company for the past few years.",
    "Explanation": "We need historical sales revenue data for each selected company to calculate the difference or percent change in summer sales revenue. This data will serve as the dependent variable in our analysis."
  },
  {
    "Step": 3,
    "Description": "Gather information on the specific summer music events and competitions that each participati

In [29]:
# Parse the description into separate points
points2 = parse_into_points(high_level_description2)

In [30]:
# Print the high level description to project report document
document.add_heading("3.1.2.3. Proposed Analysis Process", 4)
document.add_heading("3.1.2.3.1. High-level Description of Analysis Plan", 4)

for point in points2:

    document.add_paragraph("Step {}: {}".format(point["Step"], point["Description"]))

    document.add_paragraph(point["Explanation"])

    #doc.add_paragraph() # add space between steps

In [31]:
# Print the detailed walkthrough to project report document
document.add_heading("3.1.2.3.2. Detailed walkthrough of each step", 4)

<docx.text.paragraph.Paragraph at 0x285ef81d540>

In [32]:
# Iterate over each point and ask for more detail
for point in points2:
    step = point['Step']
    description = point['Description']
    print
    explanation = point['Explanation']
    
    detail_prompt = f"""In step {step} of the project analysis plan, you proposed: '{description} - {explanation}'.

                Based on this, please demonstrate carrying out this step to analyze the metric: {project_metric}.

                Provide a detailed walkthrough executing this step, including:

                - Describe specifically what data you would try to collect related to {project_metric} 

                - What sources and methods you would use to gather that data

                - Any code snippets to process or analyze the collected data

                - The key outputs, metrics, visualizations, or insights obtained from this step

                - How you would validate the insights from this step with other available data

                Please provide enough detail so another analyst could reproduce your work based solely on this response. 

                Aim to gather as much relevant metadata about each example as possible.

                Remember to include inline code/pseudocode and example outputs where applicable."""
    
    
    detailed_description = gpt_standard_query(detail_prompt, context=high_level_description2)
    
    document.add_heading("Step "+str(step), 4)
    document.add_paragraph(detailed_description+"\n")
    
    print(detailed_description)

To analyze the difference or percent change in summer sales revenue for musical instrument companies participating versus not attending summer music events and competitions, we would follow the steps outlined below:

Step 1: Identify a sample of participating and non-participating companies
To compare the impact of participation on sales revenue, we need to select similar companies that have participated in summer music events and competitions and companies that have not participated. This will help us isolate the effect of participation on sales revenue.

Data to collect:
- Company name: The name of each musical instrument company.
- Participation status: Whether the company has participated in summer music events and competitions or not.

Sources and methods to gather data:
1. Online databases: Check for databases that provide information on musical instrument companies and their participation status in summer music events and competitions.
2. Event organizers: Reach out to organizer

In [33]:
document.save('H:\Documents\Software Development\QUB Software Development\Data_analysis_module\Chapter3ProjectReport.docx')