##### Copyright 2024 Google LLC.

 Objectives
 
- Collect, explore, and preprocess historical sales data for analysis and modeling.

Tasks

1. Data Collection
- Acquire historical sales dataset (daily/weekly, retail/e-commerce).
- Include features: sales, date, promotions, holidays, weather.

2. Data Exploration
- Perform EDA: trends, seasonality, missing values.
- Summary stats, outliers, patterns (e.g., sales vs. promotions).



 3. Data Preprocessing
- Handle missing values, duplicates, inconsistencies.
- Add time features: day, month, seasonality, promotions.
- Scale/transform data (e.g., normalization).

 Project Steps & Team Assignments
1. Notebook outlines & Handling Missing Values** - Assigned to: *Belal Khamis*
2. Removing Duplicates** - Assigned to: *Marwa Kotb*
3. Handling Outliers** - Assigned to: *mahmoud sabry*
4. Standardization and Formatting** - Assigned to: *Mohamed Samy*
5. Fixing typos and inconsistentencies** - Assigned to: *Hoda Magdy*

In [34]:
import pandas as pd
df=pd.read_csv("/content/Train.csv")
df


Unnamed: 0,id,date,store_nbr,family,sales,onpromotion,city,state,type_x,cluster,transactions,type_y,locale,locale_name,description,transferred,dcoilwtico,Unnamed: 17
0,73062,2013-02-11,1,AUTOMOTIVE,0.00000,0,Quito,Pichincha,D,13,396,Holiday,National,Ecuador,Carnaval,False,97.010000,
1,73063,2013-02-11,1,BABY CARE,0.00000,0,Quito,Pichincha,D,13,396,Holiday,National,Ecuador,Carnaval,False,97.010000,
2,73064,2013-02-11,1,BEAUTY,0.00000,0,Quito,Pichincha,D,13,396,Holiday,National,Ecuador,Carnaval,False,97.010000,
3,73065,2013-02-11,1,BEVERAGES,172.00000,0,Quito,Pichincha,D,13,396,Holiday,National,Ecuador,Carnaval,False,97.010000,
4,73066,2013-02-11,1,BOOKS,0.00000,0,Quito,Pichincha,D,13,396,Holiday,National,Ecuador,Carnaval,False,97.010000,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
322042,1297885,2015-01-01,25,POULTRY,279.16998,0,Salinas,Santa Elena,D,1,2202,Holiday,National,Ecuador,Primer dia del ano,False,64.077912,
322043,1297886,2015-01-01,25,PREPARED FOODS,121.94100,0,Salinas,Santa Elena,D,1,2202,Holiday,National,Ecuador,Primer dia del ano,False,64.077912,
322044,1297887,2015-01-01,25,PRODUCE,105.00000,0,Salinas,Santa Elena,D,1,2202,Holiday,National,Ecuador,Primer dia del ano,False,64.077912,
322045,1297888,2015-01-01,25,SCHOOL AND OFFICE SUPPLIES,0.00000,0,Salinas,Santa Elena,D,1,2202,Holiday,National,Ecuador,Primer dia del ano,False,64.077912,


In [35]:
df.shape

(322047, 18)

In [36]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 322047 entries, 0 to 322046
Data columns (total 18 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   id            322047 non-null  int64  
 1   date          322047 non-null  object 
 2   store_nbr     322047 non-null  int64  
 3   family        322047 non-null  object 
 4   sales         322047 non-null  float64
 5   onpromotion   322047 non-null  int64  
 6   city          322047 non-null  object 
 7   state         322047 non-null  object 
 8   type_x        322047 non-null  object 
 9   cluster       322047 non-null  int64  
 10  transactions  322047 non-null  int64  
 11  type_y        322047 non-null  object 
 12  locale        322047 non-null  object 
 13  locale_name   322047 non-null  object 
 14  description   322047 non-null  object 
 15  transferred   322047 non-null  bool   
 16  dcoilwtico    322047 non-null  float64
 17  Unnamed: 17   0 non-null       float64
dtypes: b

In [37]:
df.describe()

Unnamed: 0,id,store_nbr,sales,onpromotion,cluster,transactions,dcoilwtico,Unnamed: 17
count,322047.0,322047.0,322047.0,322047.0,322047.0,322047.0,322047.0,0.0
mean,1682979.0,26.994672,406.383452,3.727136,8.531202,1734.11784,64.077912,
std,786249.3,15.595174,1246.88124,15.512095,4.713809,1050.335018,24.27175,
min,561.0,1.0,0.0,0.0,1.0,54.0,27.96,
25%,1010616.0,13.0,1.0,0.0,4.0,1030.0,44.94,
50%,1842406.0,27.0,19.0,0.0,9.0,1409.0,52.72,
75%,2209556.0,40.0,241.260505,1.0,13.0,2148.0,93.12,
max,3000887.0,54.0,124717.0,716.0,17.0,8359.0,107.95,


In [38]:
df['transferred'].value_counts()

Unnamed: 0_level_0,count
transferred,Unnamed: 1_level_1
False,308352
True,13695


In [39]:
data=df.drop(columns=['Unnamed: 17'])
data.shape


(322047, 17)

In [40]:
data.isna().sum()

Unnamed: 0,0
id,0
date,0
store_nbr,0
family,0
sales,0
onpromotion,0
city,0
state,0
type_x,0
cluster,0


In [41]:
df['date'] = pd.to_datetime(df['date'], errors='coerce')

In [44]:
df.duplicated().sum()


0

In [None]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Gemini API: Prompting Quickstart

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Prompting.ipynb"><img src="https://github.com/google-gemini/cookbook/blob/main/images/colab_logo_32px.png?raw=1" />Run in Google Colab</a>
  </td>
</table>

This notebook contains examples of how to write and run your first prompts with the Gemini API.

In [None]:
!pip install -U -q "google-generativeai>=0.7.2" # Install the Python SDK

In [None]:
import google.generativeai as genai

## Set up your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see the [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) quickstart for an example.

In [None]:
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

## Run your first prompt

Use the `generate_content` method to generate responses to your prompts. You can pass text directly to generate_content, and use the `.text` property to get the text content of the response.

In [None]:
model = genai.GenerativeModel('gemini-2.0-flash')
response = model.generate_content("Give me python code to sort a list")
print(response.text)

```python
# There are several ways to sort a list in Python. Here are the most common:

# 1. Using the `sort()` method (in-place sorting):
def sort_list_in_place(my_list):
  """Sorts a list in place using the sort() method.  Modifies the original list."""
  my_list.sort()  # Sorts the list in ascending order by default.
  # To sort in descending order:  my_list.sort(reverse=True)
  return my_list  # Returns the *same* list, now sorted.  Not strictly necessary, but can be convenient.


# 2. Using the `sorted()` function (creates a new sorted list):
def sort_list_new(my_list):
  """Creates a new sorted list from the original list using the sorted() function.
     The original list remains unchanged."""
  new_list = sorted(my_list) # Returns a new sorted list, leaving the original untouched.
  # To sort in descending order:  new_list = sorted(my_list, reverse=True)
  return new_list


# 3. Sorting with a custom key (using `sort()` or `sorted()`):

def sort_list_by_length(my_list):
  """So

## Use images in your prompt

Here you will download an image from a URL and pass that image in our prompt.

First, you download the image and load it with PIL:

In [None]:
!curl -o image.jpg "https://storage.googleapis.com/generativeai-downloads/images/jetpack.jpg"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  349k  100  349k    0     0  2364k      0 --:--:-- --:--:-- --:--:-- 2375k


In [None]:
import PIL.Image
img = PIL.Image.open('image.jpg')
img

Output hidden; open in https://colab.research.google.com to view.

In [None]:
prompt = """This image contains a sketch of a potential product along with some notes.
Given the product sketch, describe the product as thoroughly as possible based on what you
see in the image, making sure to note all of the product features. Return output in json format:
{description: description, features: [feature1, feature2, feature3, etc]}"""

Then you can include the image in our prompt by just passing a list of items to `generate_content`.

In [None]:
model = genai.GenerativeModel('gemini-2.0-flash')
response = model.generate_content([prompt, img])
print(response.text)

```json
{
  "description": "The product is a jetpack backpack designed to look like a normal, lightweight backpack. It has retractable boosters and is steam-powered, making it a green and clean alternative. It is designed to fit an 18\" laptop.  The backpack is USB-C chargeable and has a 15-minute battery life. It also features padded strap support.",
  "features": [
    "Lightweight",
    "Looks like a normal backpack",
    "Fits 18\" laptop",
    "Retractable Boosters",
    "Steam-powered (Green/Clean)",
    "USB-C Charging",
    "15-minute battery life",
    "Padded Strap Support"
  ]
}
```


## Have a chat

The Gemini API enables you to have freeform conversations across multiple turns.

The [ChatSession](https://ai.google.dev/api/python/google/generativeai/ChatSession) class will store the conversation history for multi-turn interactions.

In [None]:
model = genai.GenerativeModel('gemini-2.0-flash')
chat = model.start_chat(history=[])

In [None]:
response = chat.send_message("In one sentence, explain how a computer works to a young child.")
print(response.text)

A computer is like a super-smart toy that follows your instructions to play games, draw pictures, or tell you stories!



You can see the chat history:

In [None]:
print(chat.history)

[parts {
  text: "In one sentence, explain how a computer works to a young child."
}
role: "user"
, parts {
  text: "A computer is like a super-smart toy that follows your instructions to play games, draw pictures, or tell you stories!\n"
}
role: "model"
]


You can keep sending messages to continue the conversation:

In [None]:
response = chat.send_message("Okay, how about a more detailed explanation to a high schooler?")
print(response.text)

A computer operates by executing instructions stored in its memory, processing data through the CPU based on algorithms and logic, and utilizing input/output devices to interact with the user and external environment, all coordinated by an operating system that manages resources and provides a platform for applications.



## Set the temperature

Every prompt you send to the model includes parameters that control how the model generates responses. Use a `genai.GenerationConfig` to set these, or omit it to use the defaults.

Temperature controls the degree of randomness in token selection. Use higher values for more creative responses, and lower values for more deterministic responses.

You can set the `generation_config` when creating the model.

In [None]:
model = genai.GenerativeModel(
    'gemini-2.0-flash',
    generation_config=genai.GenerationConfig(
        max_output_tokens=2000,
        temperature=0.9,
    ))

Or, set the `generation_config` on an individual call to `generate_content`. Any values set there override values on the model constructor.

Note: Although you can set the `candidate_count` in the generation_config, gemini-2.0-flash models will only return a single candidate at the this time.

In [None]:
response = model.generate_content(
    'Give me a numbered list of cat facts.',
    # Limit to 5 facts.
    generation_config = genai.GenerationConfig(stop_sequences=['\n6'])
)

In [None]:
print(response.text)

Okay, here's a numbered list of fascinating cat facts:

1.  **Cats have a third eyelid:** This is called a nictitating membrane, and it's a translucent eyelid that protects and moistens the eye. It's usually only visible when a cat is sleepy or sick.

2.  **A cat's nose print is unique:** Just like human fingerprints, no two cats have the same nose print pattern.

3.  **Cats can't taste sweetness:** They lack the taste receptor gene that allows mammals to taste sweet flavors.

4.  **Cats are crepuscular animals:** This means they are most active during dawn and dusk, which is when their prey (rodents) are also most active.

5.  **Cats have flexible bodies thanks to their "floating" clavicle:** Unlike humans, a cat's collarbone isn't attached to other bones, allowing for greater flexibility and range of motion.



## Learn more

There's lots more to learn!

* For more fun prompts, check out [Market a Jetpack](https://github.com/google-gemini/cookbook/blob/main/examples/Market_a_Jet_Backpack.ipynb).
* Check out the [safety quickstart](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Safety.ipynb) next to learn about the Gemini API's configurable safety settings, and what to do if your prompt is blocked.
* For lots more details on using the Python SDK, check out this [detailed quickstart](https://ai.google.dev/tutorials/python_quickstart).