# TL;DR How to use Python to generate a graph based on the Van Westendorp Price Sensitivity Measurement (hereinafter PSM).
## Prerequisite: The Data

To generate the Van Westendorp chart, you only need a dataset from a survey with the following questions:

* What is the price that would make you refrain from buying the product because is too expensive?
* At what price do you consider that you would buy the product even if it seems expensive to you?
* At what price would you buy it even if you consider it cheap?
* What would be the price at which you would consider it too cheap to the point of doubting its quality and stop buying it?

The table should have 4 columns:  

        | Too Cheap | Cheap  | Expensive |  Too Expensive |
        | --------- | ------ | --------- | -------------- |
        |     1     |    2   |     3     |       4        |



To generate the Van Westendorp chart, you must add your file to this directory. It accepts csv, json, and excel files.


---- 


Given a dataset where respondents answer prices "too cheap", "cheap", "expensive" and "too expensive" for the product under study:
* We use `pandas` library to transform the content of the survey file into a data frame. We add two new columns: 
  - one called "CPer" with the cumulative relative frequency of the dataset indexes + 1, and
  - other called "Inverse CPer" which equals 1 - CPer of each entry
* With that information using `matplotlib` library we plot the lines:
  - "too cheap" and "cheap" that will have negative slope, while
  - "expensive" and "too expensive" that will have positive slope (More about this later)
* We find the four intersections and label them according to their corresponding value:
  - Intersection of "too cheap" and "too expensive" is the Optimal Price Point (OPP)
  - Intersection of "too cheap" and "expensive" is the Point of Marginal Cheapness (PMC)
  - Intersection of "cheap" and "too expensive" is the Point of Marginal Expensiveness (PME), and 
  - Intersection of "cheap" and expensive" is the Indifference Point(IP)
* Finally we append (add below the chart) a block of text to indicate the value of the interceptions

If you are only interested in using the code, just add your survey file to this directory and follow the instructions in the following cell.

Otherwise, if you are interested in learning how it works, open the `van_westendorp.py` file and first try to figure out by yourself. Then you can continue reviewieng the remaining cells, including code and tests with their explanations.

Tests can also be launched from console:
```bash
cd van_westendorp_en
pytest
```

In [None]:
%run van_westendorp.py 


file = 'vwsurvey.csv' # input here the name of your file
# Default currency is EUR, you can change it passing it as a second argument of type string
# van_westendorp(file, "USD")
van_westendorp(file) 
# A PNG of the plot is saved to current directory, so you can download it. Also you can copy it 
# to clipboard clicking on the "copy" icon below

# This first block is optional. 
Use it if you don't have your survey data yet to simulate it and store it in a file. `generate_random_file(n)` is a function, where n is type int and the number of rows desired, will generate a csv, a json and an xlsx, all with the same information. 

In [None]:
from random import randint
import json
import csv
import pandas as pd


def generate_random_data(n):
  data = []
  for i in range(n):
    data.append({
        "Too Cheap": randint(10, 70),
        "Cheap": randint(20, 80),
        "Expensive": randint(30, 90),
        "Too Expensive": randint(40, 100)
    })

  with open('vwsurvey.json', 'w') as output_file_json:
    output_file_json.write(json.dumps(data))
  
  with open('vwsurvey.csv', 'w', newline='') as output_file:
    writer = csv.DictWriter(output_file, fieldnames=data[0].keys())

    writer.writeheader()
    for row in data:
      writer.writerow(row)

  with open('vwsurvey.xlsx', 'w') as output_file_excel:
    df = pd.DataFrame(data)
    df.to_excel("vwsurvey.xlsx", index=False)


In [None]:
data = generate_random_data(50)

In [None]:
# Testing configuration
import pytest
import ipytest
ipytest.autoconfig()


Cells that are collapsed and start with `%%ipytest -qq` are tests. You can open them and check 

In [None]:
%%ipytest -qq
# Test generate_random_data(n)
def test_output():
    columns = ['Too Cheap', 'Cheap', 'Expensive', 'Too Expensive']
    n = randint(1, 500)
    generate_random_data(n)
    df_csv = pd.read_csv('vwsurvey.csv')
    assert df_csv.shape == (n, 4)
    assert set(df_csv.columns) == set(columns)
    df_json = pd.read_json('vwsurvey.json')
    assert df_json.shape == (n, 4)
    assert set(df_json.columns) == set(columns)
    df_excel = pd.read_excel('vwsurvey.xlsx')
    assert df_excel.shape == (n, 4)
    assert set(df_excel.columns) == set(columns)

## First step: prepare data to be plotted

In [1]:
import pandas as pd
import numpy as np


prices = {'Too Cheap': [100,120,200,200,300,100,100,300,100,350,340,450,100,257,109,109,280,400,250,200],
          'Cheap': [150,200,250,300,340,190,200,350,120,360,360,460,110,388,299,129,350,410,260,240],
          'Expensive': [400,400,450,350,400,200,300,370,180,370,490,490,130,433,399,149,400,420,270,280],
          'Too Expensive': [500,480,500,400,490,300,500,380,200,380,500,500,140,499,422,199,410,430,280,300],
        }
# We use the dictionary to create a data frame of length 20 
df = pd.DataFrame(prices)
# Trims and lowercases the columns' labels
df.columns = df.columns.str.strip().str.lower()
print(f'Data Frame length is {df.index.stop}')
print(f'Data Frame length is {len(df)}')

# Creates two new columns named "CPer" and "1 - CPer" meaning Cumulative Percentage.  
df['CPer'] = (np.arange(1, df.index.stop + 1, 1)/df.index.stop).round(3)
# Using numpy np.arange(start, stop, step) we create a new array that starts in 1, stops at data frame's length, 
# increases by one unit each iteration, divides such value by the data frame and rounds it to three decimals 
# So the first value will be: (1/20)*round(3) = 0.05
# The second value will be: (2/20)*round(3) = 0.10
# And so on until the last one will be: (20/20)*round(3) = 1.00
df['1 - CPer'] = 1 - df['CPer']
# To obtain the inverse value, we substract 1 minus the value obtained before

df

Data Frame length is 20
Data Frame length is 20


Unnamed: 0,too cheap,cheap,expensive,too expensive,CPer,1 - CPer
0,100,150,400,500,0.05,0.95
1,120,200,400,480,0.1,0.9
2,200,250,450,500,0.15,0.85
3,200,300,350,400,0.2,0.8
4,300,340,400,490,0.25,0.75
5,100,190,200,300,0.3,0.7
6,100,200,300,500,0.35,0.65
7,300,350,370,380,0.4,0.6
8,100,120,180,200,0.45,0.55
9,350,360,370,380,0.5,0.5


In [4]:
df.describe()


Unnamed: 0,too cheap,cheap,expensive,too expensive,CPer,1 - CPer
count,20.0,20.0,20.0,20.0,20.0,20.0
mean,218.25,273.3,344.05,390.5,0.525,0.475
std,113.102131,103.879535,109.42023,116.343638,0.295804,0.295804
min,100.0,110.0,130.0,140.0,0.05,0.0
25%,106.75,197.5,277.5,300.0,0.2875,0.2375
50%,200.0,279.5,384.5,416.0,0.525,0.475
75%,300.0,352.5,405.0,499.25,0.7625,0.7125
max,450.0,460.0,490.0,500.0,1.0,0.95


In [None]:
# If we plot these two columns as separate lines with x coordinates corresponding to their 
# indexes and the y coordinates corresponding to their values, we evidence that CPer is a line with positive 
# slope while "1- CPer" has a negative slope
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter # to format the y axis as percentage
fig, ax = plt.subplots()

ax.plot(df["CPer"].index, df['CPer'], color="green", linestyle="dashed", label="Cumulative %")
ax.plot(df['1 - CPer'].index, df['1 - CPer'], color="orange", label="Inversed Cumulative %")
ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y))) # to format the y axis as percentage
ax.legend()
plt.show()


In [None]:
# There is a chance that values repeat. In this case 500 is the most frequent "too expensive value"
sorted_too_expensive = sorted(df["too expensive"]) #sorts this column and stores it in a new variable
new_df = pd.DataFrame( ) #creates a new data frame
new_df["sorted_too_expensive"] = sorted_too_expensive # new column with sorted items
new_df["CPer"] = df["CPer"] # new column with CPer values
new_df # new data frame with "too expensive" values and CPer


That's the data that we will plot later but first, just for practicing data wrangling a little more,
let compute the frequency of each value

In [None]:
counts = new_df['sorted_too_expensive'].value_counts().sort_index()
print(counts)

# This will show each respondent "too expensive" price and its frequency. 
# In this case, 500 is most frequent response with 5

We create another data frame to evaluate frequency

In [None]:
single_df = pd.DataFrame() # new data frame
cumulative_sum = counts.cumsum() # Provides "too_expensive" price as key and accumulated 
#frequency as value
print(cumulative_sum)


In [None]:
single_df["price"] = cumulative_sum.index # new column with prices
single_df["frequency"] = counts.values # new column with frequency values
single_df["single relative frequency"] = (counts.values/len(single_df)).round(3)*100 # Since this data frame's length is 
# shorter, all the relative values have also changed, requiring to be recomputed
single_df["cumulative absolute frequency"] = cumulative_sum.values 
single_df["cumulative relative frequency (%)"] =  (np.arange(1, single_df.index.stop + 1, 1)/single_df.index.stop).round(3) 
single_df["inverse cumulative relative (%)"] = 1 - single_df["cumulative relative frequency (%)"]
single_df.info()
# The new dataframe as 14 entries since now we have grouped them by frequency

In the next cell we'll see the full single_dataframe we build for the "too expensive" response, sorted in ascent by value.
Later when we create the plot we will draw one lines per each column ("too cheap, "cheap, "expensive, "too expensive") using their values sorted in ascent for the x coordinates
And since we want "too cheap" and "cheap" lines intersecting with "expensive" and "too expensive", we use the inverse cumulative percentage to generate negative slope lines for "too cheap" and "cheap" (y will be smaller for each x value) and the cumulative percentage for the other two. 

In [None]:
single_df

Now we plot the line

In [None]:
fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(single_df["price"], single_df["cumulative relative frequency (%)"])
ax.set_xlabel('Price')
ax.set_ylabel('Cumulative Percentage (%)')
ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y))) 
plt.show()


We evidence that (with price as x coordinate and cumulative % as y coordinate):
first point = 140, 7.1%
second point = 199, 14.3%
third point = 200, 21.4%
Between first and second we evidence a 59 points change in the x coordinate and double the points in the y coordinate 
while the x change is only one point and the y change is half the points. For that reason the slope is very steep

## Second step: plot the lines

Now we return to the original data frame and plot all the lines

In [None]:
fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(df['too expensive'].sort_values(), df['CPer']) # positive slope
ax.plot(df['expensive'].sort_values(), df['CPer']) # positive slope
ax.plot(df['cheap'].sort_values(), df['1 - CPer']) # negative slope
ax.plot(df['too cheap'].sort_values(), df['1 - CPer']) # negative slope

ax.legend(['too expensive', 'expensive', 'cheap',
            'too cheap'], loc="best") # set the name of legends
ax.set_title("Van Westendorp's Price Sensitivity Meter", # set title
          pad=10, size=18, fontweight='bold')

ax.set_xlabel('Price: EUR') # set x axis label, later we will add a dynamic value set by user
ax.set_ylabel('Number of respondents (cumulative %)') # set y axis label

ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y))) # formats y axis label as percentage

ax.grid(True) # adds a grid
plt.show()

## Third step: find and label interceptions

In [None]:
import shapely
from shapely.geometry import LineString

##### START PREVIOUS CODE #######
fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(df['too expensive'].sort_values(), df['CPer']) # positive slope
ax.plot(df['expensive'].sort_values(), df['CPer']) # positive slope
ax.plot(df['cheap'].sort_values(), df['1 - CPer']) # negative slope
ax.plot(df['too cheap'].sort_values(), df['1 - CPer']) # negative slope

ax.legend(['too expensive', 'expensive', 'cheap',
            'too cheap'], loc="best") # set the name of legends
ax.set_title("Van Westendorp's Price Sensitivity Meter", # set title
          pad=10, size=18, fontweight='bold')

ax.set_xlabel('Price: EUD') # set x axis label, later we will add a dynamic value set by user
ax.set_ylabel('Number of respondents (cumulative %)') # set y axis label

ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y))) # formats y axis label as percentage

ax.grid(True) # adds a grid
##### END PREVIOUS CODE #######



# We use interception method of LineString object from shapely library

too_expensive = LineString(list(zip(df['too expensive'].sort_values(), df['CPer'])))
expensive = LineString(list(zip(df['expensive'].sort_values(), df['CPer'])))
cheap = LineString(list(zip(df['cheap'].sort_values(), df['1 - CPer'])))
too_cheap = LineString(list(zip(df['too cheap'].sort_values(), df['1 - CPer'])))

"""
  - Intersection of "cheap" and expensive" is the Indifference Point(IP)
  - Intersection of "cheap" and "too expensive" is the Point of Marginal Expensiveness (PME) 
  - Intersection of "too cheap" and "expensive" is the Point of Marginal Cheapness (PMC)
  - Intersection of "too cheap" and "too expensive" is the Optimal Price Point (OPP)
"""

intersection_1 = expensive.intersection(cheap) # IP
intersection_2 = too_expensive.intersection(cheap) # PME
intersection_3 = expensive.intersection(too_cheap) # PMC
intersection_4 = too_expensive.intersection(too_cheap) # OPP
intersection_points = [intersection_1, intersection_2, intersection_3, intersection_4] # stored all interceptions in an array

for i, intersection in enumerate(intersection_points):
  if(type(intersection) != shapely.geometry.point.Point): 
    intersection_points[i] = intersection.interpolate(0)
# Sometimes lines can overlapse in a sector. In that case we consider the first point as the interception

indicators = ['ro', 'go', 'yo', 'bo'] # meaning red, green, yellow and blue ovals


for point, indicator in zip(intersection_points, indicators):
  ax.plot(*point.xy, indicator)

# Round x coodinate of each point to all text tag

IP = round(intersection_points[0].x)
PME = round(intersection_points[1].x)
PMC = round(intersection_points[2].x)
OPP = round(intersection_points[3].x)

# Here you fine tune the text tag position
ax.annotate('IP', xy=(intersection_points[0].x + 2.5, intersection_points[0].y - 0.02))
ax.annotate('PME', xy=(intersection_points[1].x + 2.5, intersection_points[1].y - 0.02))
ax.annotate('PMC', xy=(intersection_points[2].x + 2.5, intersection_points[2].y - 0.02))
ax.annotate('OPP', xy=(intersection_points[3].x + 2.5, intersection_points[3].y - 0.02))
plt.show()

## Four step: Append info text

In [None]:
##### START PREVIOUS CODE #######
fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(df['too expensive'].sort_values(), df['CPer']) # positive slope
ax.plot(df['expensive'].sort_values(), df['CPer']) # positive slope
ax.plot(df['cheap'].sort_values(), df['1 - CPer']) # negative slope
ax.plot(df['too cheap'].sort_values(), df['1 - CPer']) # negative slope

ax.legend(['too expensive', 'expensive', 'cheap',
            'too cheap'], loc="best") # set the name of legends
ax.set_title("Van Westendorp's Price Sensitivity Meter", # set title
          pad=10, size=18, fontweight='bold')

ax.set_xlabel('Price: EUD') # set x axis label, later we will add a dynamic value set by user
ax.set_ylabel('Number of respondents (cumulative %)') # set y axis label

ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y))) # formats y axis label as percentage

ax.grid(True) # adds a grid




# We use interception method of LineString object from shapely library

too_expensive = LineString(list(zip(df['too expensive'].sort_values(), df['CPer'])))
expensive = LineString(list(zip(df['expensive'].sort_values(), df['CPer'])))
cheap = LineString(list(zip(df['cheap'].sort_values(), df['1 - CPer'])))
too_cheap = LineString(list(zip(df['too cheap'].sort_values(), df['1 - CPer'])))

"""
  - Intersection of "cheap" and expensive" is the Indifference Point(IP)
  - Intersection of "cheap" and "too expensive" is the Point of Marginal Expensiveness (PME) 
  - Intersection of "too cheap" and "expensive" is the Point of Marginal Cheapness (PMC)
  - Intersection of "too cheap" and "too expensive" is the Optimal Price Point (OPP)
"""

intersection_1 = expensive.intersection(cheap) # IP
intersection_2 = too_expensive.intersection(cheap) # PME
intersection_3 = expensive.intersection(too_cheap) # PMC
intersection_4 = too_expensive.intersection(too_cheap) # OPP
intersection_points = [intersection_1, intersection_2, intersection_3, intersection_4] # stored all interceptions in an array

for i, intersection in enumerate(intersection_points):
  if(type(intersection) != shapely.geometry.point.Point): 
    intersection_points[i] = intersection.interpolate(0)
# Sometimes lines can overlapse in a sector. In that case we consider the first point as the interception

indicators = ['ro', 'go', 'yo', 'bo'] # meaning red, green, yellow and blue ovals


for point, indicator in zip(intersection_points, indicators):
  ax.plot(*point.xy, indicator)

# Round x coodinate of each point to all text tag

IP = round(intersection_points[0].x)
PME = round(intersection_points[1].x)
PMC = round(intersection_points[2].x)
OPP = round(intersection_points[3].x)

# Here you fine tune the text tag position
ax.annotate('IP', xy=(intersection_points[0].x + 2.5, intersection_points[0].y - 0.02))
ax.annotate('PME', xy=(intersection_points[1].x + 2.5, intersection_points[1].y - 0.02))
ax.annotate('PMC', xy=(intersection_points[2].x + 2.5, intersection_points[2].y - 0.02))
ax.annotate('OPP', xy=(intersection_points[3].x + 2.5, intersection_points[3].y - 0.02))
##### END PREVIOUS CODE #######

ax.text(80.5, -0.5, f'''Indifference Point(IP)= EUR {str(f'{IP:,}')}
Point of Marginal Cheapness(PMC)= EUR {str(f'{PMC:,}')}
Point of Marginal Expensiveness(PME)= EUR {str(f'{PME:,}')}
Optimal Price Point(OPP)= EUR {str(f'{OPP:,}')}''', fontsize=12)

plt.show()

That's it! Now we create a function to put everything together

In [None]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
import shapely
from shapely.geometry import LineString
# Currency is an optional parameter that defaults to "EUR"
def van_westendorp(data, currency="EUR"):
  
  # Checks if file passed is supported and raise an exception when not
   
  if os.path.exists(data):
    if data.endswith("json"):
        df = pd.read_json(data)
    elif data.endswith("csv"):
        df = pd.read_csv(data)
    elif data.endswith("xlsx") or data.endswith("xls"):
        df = pd.read_excel(data)
    else:
        raise Exception("Unsupported file type")
  else:
    raise Exception("File not found, check typos")
  


  # Erase rows with null values
  # Quick format and integrity check
  # Trims and lowercases the columns' labels
  df.dropna()
  df.columns = df.columns.str.strip().str.lower()
  columns = columns = ['too cheap', 'cheap', 'expensive', 'too expensive'] 
  # Check if columns have the correct names
  if (set(df.columns) != set(columns)):
    raise Exception("Columns do not conform to requirements")


  # Creates two new columns named "CPer" and "1 - CPer" meaning Cumulative Percentage.  
  df['CPer'] = (np.arange(1, df.index.stop + 1, 1)/df.index.stop).round(3)
  # Using numpy we create a new array that starts in 1, stops at data frame's length plus one, 
  # increases by one unit each iteration, divides such value by the data frame's plus one and rounds it to three places 
  # So the first value will be: (1/20)*round(3) = 0.05
  # The second value will be: (2/20)*round(3) = 0.10
  # And so on until the last one will be: (20/20)*round(3) = 1.00
  df['1 - CPer'] = 1 - df['CPer']
  
  '''
  IP = Indifference Point
  PMC = Point of Marginal cheapness
  PME = Point of Marginal Expensiveness
  OPP = Optimal Price Point
  '''  

  fig, ax = plt.subplots(figsize=(10, 4))
  ax.plot(df['too expensive'].sort_values(), df['CPer']) # positive slope
  ax.plot(df['expensive'].sort_values(), df['CPer']) # positive slope
  ax.plot(df['cheap'].sort_values(), df['1 - CPer']) # negative slope
  ax.plot(df['too cheap'].sort_values(), df['1 - CPer']) # negative slope

  ax.legend(['too expensive', 'expensive', 'cheap',
              'too cheap'], loc="best") # set the name of legends
  ax.set_title("Van Westendorp's Price Sensitivity Meter", # set title
            pad=10, size=18, fontweight='bold')

  ax.set_xlabel(f'Price: {currency} ') # set x axis label, later we will add a dynamic value set by user
  ax.set_ylabel('Number of respondents (cumulative %)') # set y axis label

  ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y))) # formats y axis label as percentage

  ax.grid(True) # adds a grid




  # We use interception method of LineString object from shapely library

  too_expensive = LineString(list(zip(df['too expensive'].sort_values(), df['CPer'])))
  expensive = LineString(list(zip(df['expensive'].sort_values(), df['CPer'])))
  cheap = LineString(list(zip(df['cheap'].sort_values(), df['1 - CPer'])))
  too_cheap = LineString(list(zip(df['too cheap'].sort_values(), df['1 - CPer'])))

  """
    - Intersection of "cheap" and expensive" is the Indifference Point(IP)
    - Intersection of "cheap" and "too expensive" is the Point of Marginal Expensiveness (PME) 
    - Intersection of "too cheap" and "expensive" is the Point of Marginal Cheapness (PMC)
    - Intersection of "too cheap" and "too expensive" is the Optimal Price Point (OPP)
  """

  intersection_1 = expensive.intersection(cheap) # IP
  intersection_2 = too_expensive.intersection(cheap) # PME
  intersection_3 = expensive.intersection(too_cheap) # PMC
  intersection_4 = too_expensive.intersection(too_cheap) # OPP
  intersection_points = [intersection_1, intersection_2, intersection_3, intersection_4] # stored all interceptions in an array

  for i, intersection in enumerate(intersection_points):
    if(type(intersection) != shapely.geometry.point.Point): 
      intersection_points[i] = intersection.interpolate(0)
  # Sometimes lines can overlapse in a sector. In that case we consider the first point as the interception

  indicators = ['ro', 'go', 'yo', 'bo'] # meaning red, green, yellow and blue ovals


  for point, indicator in zip(intersection_points, indicators):
    ax.plot(*point.xy, indicator)

  # Round x coodinate of each point to all text tag

  IP = round(intersection_points[0].x)
  PME = round(intersection_points[1].x)
  PMC = round(intersection_points[2].x)
  OPP = round(intersection_points[3].x)

  # Here you fine tune the text tag position
  ax.annotate('IP', xy=(intersection_points[0].x + 2.5, intersection_points[0].y - 0.02))
  ax.annotate('PME', xy=(intersection_points[1].x + 2.5, intersection_points[1].y - 0.02))
  ax.annotate('PMC', xy=(intersection_points[2].x + 2.5, intersection_points[2].y - 0.02))
  ax.annotate('OPP', xy=(intersection_points[3].x + 2.5, intersection_points[3].y - 0.02))


  ax.text(5, -0.5, f'''  Indifference Point(IP)= {currency} {str(f'{IP:,}')}
  Point of Marginal Cheapness(PMC)= {currency} {str(f'{PMC:,}')}
  Point of Marginal Expensiveness(PME)= {currency} {str(f'{PME:,}')}
  Optimal Price Point(OPP)= {currency} {str(f'{OPP:,}')}''', fontsize=12)

  plt.show()
    
  


Tests to check if guards are working

In [None]:
%%ipytest -qq

# Test van_westendorp(file)
def test_van_westendorp_raises_exception_on_unexistent_file():
    with pytest.raises(Exception) as excinfo:
        van_westendorp("vwsurvey.png")
    assert excinfo.match(
        "File not found, check typos"
    ), f"Unexpected exception message: {excinfo.value}"


def test_van_westendorp_raises_exception_on_invalid_file_type():
    with pytest.raises(Exception) as excinfo:
        van_westendorp("output.png")

    assert excinfo.match(
        "Unsupported file type"
    ), f"Unexpected exception message: {excinfo.value}"


def test_van_westendorp_raises_exception_on_features_names():
    with pytest.raises(Exception) as excinfo:
        van_westendorp("vwsurveybad.csv")

    assert excinfo.match(
        "Columns do not conform to requirements"
    ), f"Unexpected exception message: {excinfo.value}"


In order to generate the Van Westendorp graph, you need to add your file to this directory. It supports csv, json and excel files. If you don't have a file, you can simulate one, as previously explained

In [None]:
file = 'vwsurvey.csv' 
# Default currency is EUR, you can change it passing it as a second argument of type string
# van_westendorp(file, "USD")
van_westendorp(file)