<h1 align=center style="line-height:200%;font-family=vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Better Chocolates
</font>
</h1>


<p dir=ltr style="direction: ltr;text-align: left;line-height:200%;font-family=vazir;font-size=medium">
<font face="vazir" size=3>
Let's assume that after trying several regular (non-dark) chocolates, we realized that many of them are also delicious, and if we can find the higher quality ones, they will likely sell well. Therefore, in the final step of the project, we want to identify companies that produce higher-quality chocolates and separate their products.
</font>
</p>


<h2 align=right style="line-height:200%;font-family=vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Dataset
</font>
</h2>

<p dir=ltr style="direction: ltr;text-align: left;line-height:200%;font-family=vazir;font-size=medium">
<font face="vazir" size=3>
In the previous step, we estimated the prices of the chocolates and saved the final dataframe in a file named <code>chocolate_price.csv</code>. Initially, we will read that file into a dataframe.
</font>
</p>


In [None]:
import numpy as np
import pandas as pd

df = pd.read_csv('chocolate_price.csv')
df.head()

<h2 align=right style="line-height:200%;font-family=vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Part One
</font>
</h2>
<p dir=ltr style="direction: ltr;text-align: left;line-height:200%;font-family=vazir;font-size=medium">
<font face="vazir" size=3>
Note that in this step, we are only dealing with non-dark chocolates. Therefore, initially, we separate the chocolates with a cocoa percentage of 70% or less and save them in the dataframe <code>df</code>.
</font>
</p>
<p dir=ltr style="direction: ltr;text-align: left;line-height:200%;font-family=vazir;font-size=medium">
<font face="vazir" size=3>
<span style="color:green"><b>Tip:</b></span>
After separating and creating the dataframe, we reset the indices using the function <code>reset_index()</code>. In this step, we also pay attention to the arguments <code>inplace</code> and <code>drop</code>.
</font>
</p>


In [None]:
sweet_chocolates = df[df['Cocoa Percent']<=70]

sweet_chocolates.reset_index(inplace=True,drop=True)

df=sweet_chocolates

<h2 align=right style="line-height:200%;font-family=vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Part Two
</font>
</h2>
<p dir=ltr style="direction: ltr;text-align: left;line-height:200%;font-family=vazir;font-size=medium">
<font face="vazir" size=3>
In this part, for each chocolate-making company, we calculate the average rating of their chocolates for each year and save it in the dataframe <code>companies</code> as shown in the table below.

<center>

| | 2006 | ... | 2015 | 2016 | 2017 |
| :---: | :---: | :---: | :---: | :---: | :---: |
| A. Morin | NaN | ... | 63.75 | 75.0 | NaN |
| ... | ... | ... | ... | ... | ...  |
| hexx | NaN | ... | 60.00 | NaN | NaN |
| twenty-four blackbirds | NaN | ... | NaN | NaN | NaN |

</center>
</font>
</p>

<p dir=ltr style="direction: ltr;text-align: left;line-height:200%;font-family=vazir;font-size=medium">
<font face="vazir" size=3>

</font>
</p>


In [None]:
average_scores=df.groupby(['Company',"Review Date"])['Rating'].mean()
companies = average_scores.unstack()
companies

<p dir=ltr style="direction: ltr;text-align: left;line-height:200%;font-family=vazir;font-size=medium">
<font face="vazir" size=3>
The lines below are applied to standardize the appearance of the table.
</font>
</p>


In [None]:
companies.columns.name = None
companies.index.name = None
companies

<h2 align=right style="line-height:200%;font-family=vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Part Three
</font>
</h2>
<p dir=ltr style="direction: ltr;text-align: left;line-height:200%;font-family=vazir;font-size=medium">
<font face="vazir" size=3>
To find the best companies, we want to consider only the ratings from the year 2012 onward (including the year 2012 itself).
First, for each company, we calculate the average rating for the years from 2012 onward and save it in a variable <code>mean_ratings</code>.
We note that our output should be a pandas series (<code>pd.Series</code>).
</font>
</p>


In [None]:
mean_ratings = companies.loc[:, "2012":].apply(lambda row: row.mean(), axis=1)

mean_ratings


<p dir=ltr style="direction: ltr;text-align: left;line-height:200%;font-family=vazir;font-size=medium">
<font face="vazir" size=3>
Now, using the average rating of each company (<code>mean_ratings</code>), we find the top 10 companies with the best ratings and save them in a variable <code>best_ratings</code>. Note that our output should be a pandas series (<code>pd.Series</code>).
</font>
</p>

<p dir=ltr style="direction: ltr;text-align: left;line-height:200%;font-family=vazir;font-size=medium">
<font face="vazir" size=3>
<span style="color:purple"><b>Note:</b></span>
In cases where companies have the same rating, we select the company based on alphabetical order. For example, if both companies <code>A</code> and <code>B</code> have an average rating of <code>75.00</code>, the priority goes to company <code>A</code>.
</font>
</p>


In [None]:

copied_mean_rating = mean_ratings.copy().reset_index(drop=False)  # Resetting index to columns
best_ratings = copied_mean_rating.sort_values(by=[0,'index'], ascending=[False, True])  # Sorting by specified columns
best_ratings.set_index( "index", inplace=True)  # Setting index back to original

best_ratings.index.name = None
best_ratings.columns.name = None
best_ratings=best_ratings[0]
best_ratings = best_ratings.head(10)

best_ratings


<h2 align=right style="line-height:200%;font-family=vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Part Four
</font>
</h2>
<p dir=ltr style="direction: ltr;text-align: left;line-height:200%;font-family=vazir;font-size=medium">
<font face="vazir" size=3>
Now we will select chocolates from the dataframe <code>df</code> whose manufacturers are among the top 10 companies (<code>best_ratings</code>) and save them in a dataframe called <code>chocolates_to_sell</code>.
</font>
</p>

<p dir=ltr style="direction: ltr;text-align: left;line-height:200%;font-family=vazir;font-size=medium">
<font face="vazir" size=3>
<span style="color:green"><b>Tip 1:</b></span>
After selecting these chocolates, we reset the indices using the command <code>reset_index</code>. We also remember to use the <code>drop</code> argument.

<br>

</font>
</p>


In [None]:
chocolates_to_sell = df[df["Company"].isin(best_ratings.index)]
chocolates_to_sell.reset_index(drop=True, inplace=True)

# best_ratings_copy = mean_ratings.copy()
# best_ratings_copy.reset_index(inplace=True)
# sorted_best_ratings = best_ratings_copy.sort_values(by=["col1", 'col2'], ascending=[False, True])
# chocolates_to_sell = sorted_best_ratings.iloc[:10]



<h2 align=right style="line-height:200%;font-family=vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Part Five
</font>
</h2>
<p dir=ltr style="direction: ltr;text-align: left;line-height:200%;font-family=vazir;font-size=medium">
<font face="vazir" size=3>
Finally, we save the total price of these chocolates (<code>chocolates_to_sell</code>) in the variable <code>priceSum</code> to determine our total revenue from these chocolates.
</font>
</p>


In [None]:
priceSum = chocolates_to_sell['price(100g)'].sum()


<h2 align=right style="line-height:200%;font-family=vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
<b>Answer Cell</b>
</font>
</h2>

<p dir=ltr style="direction: ltr;text-align: left;line-height:200%; font-family=vazir; font-size=medium">
<font face="vazir" size=3>
    To create the file <code>result.zip</code>, we run the cell below. We ensure that any changes made in the notebook are saved (<code>ctrl+s</code>) before running it, so that our code can be reviewed if support is needed.
</font>
</p>


In [None]:
import zlib
import zipfile

def compress(file_names):
    print("File Paths:")
    print(file_names)
    compression = zipfile.ZIP_DEFLATED
    with zipfile.ZipFile("result.zip", mode="w") as zf:
        for file_name in file_names:
            zf.write('./' + file_name, file_name, compress_type=compression)

df.to_csv('normal_chocolates.csv', index = True)
companies.to_csv('companies.csv', index = True)
chocolates_to_sell.to_csv('chocolates_to_sell.csv', index = True)
mean_ratings.to_csv("mean_ratings.csv", index = True)
best_ratings.to_csv("best_ratings.csv", index = True)

np.savez("answers.npz" ,priceSum = priceSum)
file_names = ["answers.npz", "project1_step3.ipynb", "chocolates_to_sell.csv", 
              "companies.csv", "mean_ratings.csv", "best_ratings.csv", "normal_chocolates.csv"]
compress(file_names)