# Concept Data
Data is a collection of information gathered by observations, measurements, research or analysis. They may consist of facts, numbers, names, figures or even description of things. Data is organized in the form of graphs, charts or tables.

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv('sephora_website_dataset.csv')

In [3]:
display(df)

Unnamed: 0,id,brand,category,name,size,rating,number_of_reviews,love,price,value_price,...,MarketingFlags,MarketingFlags_content,options,details,how_to_use,ingredients,online_only,exclusive,limited_edition,limited_time_offer
0,2218774,Acqua Di Parma,Fragrance,Blu Mediterraneo MINIATURE Set,5 x 0.16oz/5mL,4.0,4,3002,66.0,75.0,...,True,online only,no options,This enchanting set comes in a specially handc...,Suggested Usage:-Fragrance is intensified by t...,Arancia di Capri Eau de Toilette: Alcohol Dena...,1,0,0,0
1,2044816,Acqua Di Parma,Cologne,Colonia,0.7 oz/ 20 mL,4.5,76,2700,66.0,66.0,...,True,online only,- 0.7 oz/ 20 mL Spray - 1.7 oz/ 50 mL Eau d...,An elegant timeless scent filled with a fresh-...,no instructions,unknown,1,0,0,0
2,1417567,Acqua Di Parma,Perfume,Arancia di Capri,5 oz/ 148 mL,4.5,26,2600,180.0,180.0,...,True,online only,- 1oz/30mL Eau de Toilette - 2.5 oz/ 74 mL E...,Fragrance Family: Fresh Scent Type: Fresh Citr...,no instructions,Alcohol Denat.- Water- Fragrance- Limonene- Li...,1,0,0,0
3,1417617,Acqua Di Parma,Perfume,Mirto di Panarea,2.5 oz/ 74 mL,4.5,23,2900,120.0,120.0,...,True,online only,- 1 oz/ 30 mL Eau de Toilette Spray - 2.5 oz/...,Panarea near Sicily is an an island suspended ...,no instructions,unknown,1,0,0,0
4,2218766,Acqua Di Parma,Fragrance,Colonia Miniature Set,5 x 0.16oz/5mL,3.5,2,943,72.0,80.0,...,True,online only,no options,The Colonia Miniature Set comes in an iconic A...,Suggested Usage:-Fragrance is intensified by t...,Colonia: Alcohol Denat.- Water- Fragrance- Lim...,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9163,2208502,SEPHORA COLLECTION,Face Masks,The Rose Gold Mask,no size,2.0,15,6200,6.0,6.0,...,True,limited edition · exclusive,no options,What it is: A limited-edition- nurturing and h...,Suggested Usage:-Unfold the mask.-Apply the ma...,-Rose Quartz Extract: Hydrates dry skin. Aqua...,0,1,1,0
9164,2298909,SEPHORA COLLECTION,Lip Sets,Give Me Some Sugar Colorful Gloss Balm Set,3 x 0.32 oz/ 9 g,0.0,0,266,15.0,27.0,...,True,exclusive,no options,What it is: A set of three bestselling Colorfu...,Suggested Usage:-Apply directly to lips using ...,Colorful Gloss Balm Wanderlust: Hydrogenated P...,0,1,0,0
9165,2236750,SEPHORA COLLECTION,Tinted Moisturizer,Weekend Warrior Tone Up Cream,0.946 oz/ 28 mL,0.0,0,445,16.0,16.0,...,True,exclusive,no options,What it is: A weightless complexion booster- i...,Suggested Usage:-Use this product as the last ...,Aqua (Water)- Dimethicone- Isohexadecane- Poly...,0,1,0,0
9166,50,SEPHORA COLLECTION,no category,Gift Card,no size,5.0,46,0,50.0,50.0,...,False,0,no options,What it is:- Available in denominations of $10...,no instructions,unknown,0,0,0,0


## DATA CATEGORICAL
In Python, categorical data refers to a type of data that represents categories or groups. Categorical data can be either nominal or ordinal, where nominal data represents categories without any inherent order, while ordinal data represents categories with a meaningful order or hierarchy.

### Nominal
In Python, nominal data refers to categorical data where the categories do not have an inherent order or ranking. Nominal data represents distinct categories or groups without any numerical significance.

In [4]:
category = df['category']

In [5]:
category

0                Fragrance
1                  Cologne
2                  Perfume
3                  Perfume
4                Fragrance
               ...        
9163            Face Masks
9164              Lip Sets
9165    Tinted Moisturizer
9166           no category
9167           no category
Name: category, Length: 9168, dtype: object

### Ordinal
In Python, ordinal data refers to categorical data where the categories have a natural order or ranking associated with them. Unlike nominal data, the categories in ordinal data have a meaningful sequence or hierarchy.

In [6]:
rating = df['rating']

In [7]:
rating

0       4.0
1       4.5
2       4.5
3       4.5
4       3.5
       ... 
9163    2.0
9164    0.0
9165    0.0
9166    5.0
9167    0.0
Name: rating, Length: 9168, dtype: float64

## DATA NUMERICAL
In Python, numerical data refers to data that consists of numbers and can be represented using numeric types such as integers or floating-point numbers. Numerical data is used to quantify variables and perform mathematical operations.

### Discrete
In Python, "diskrit" (discrete) data refers to data that consists of distinct and separate values, typically integers, with no values in between. Discrete data often represents counts or whole numbers, and it cannot take on any value within a range.

In [8]:
number_of_reviews = df['number_of_reviews']

In [9]:
number_of_reviews

0        4
1       76
2       26
3       23
4        2
        ..
9163    15
9164     0
9165     0
9166    46
9167     0
Name: number_of_reviews, Length: 9168, dtype: int64

### Continuous
In Python, "kontinue" (continuous) data refers to data that can take on any value within a given range. Continuous data is typically represented by real numbers and can include fractions or decimals. Unlike discrete data, continuous data can have an infinite number of possible values between any two points.

In [10]:
value_price = df ['value_price']

In [11]:
value_price

0        75.0
1        66.0
2       180.0
3       120.0
4        80.0
        ...  
9163      6.0
9164     27.0
9165     16.0
9166     50.0
9167     50.0
Name: value_price, Length: 9168, dtype: float64

## Boolean
In Python, a boolean refers to a data type that can have one of two values: True or False. Booleans are used to represent logical states and are fundamental for controlling the flow of programs through conditional statements and boolean operations.

In [12]:
MarketingFlags = df ['MarketingFlags']

In [13]:
MarketingFlags 

0        True
1        True
2        True
3        True
4        True
        ...  
9163     True
9164     True
9165     True
9166    False
9167    False
Name: MarketingFlags, Length: 9168, dtype: bool

## Meta Data 

"Metadata" is information that provides context or description about a particular piece of data. It can include descriptions of the data structure, data source, or information about how the data is generated or used. Metadata aids in understanding and managing data by providing additional information about the data itself.

For example, metadata for a music file might include the song title, artist name, album, release year, and music genre. For a database, metadata could include table names, column names, data types, and primary keys. Metadata is also used in the context of big data to provide information about the origin, size, structure, and characteristics of large and complex data.

## Big Data

"big data" refers to large and complex data sets that are difficult to manage, process, and analyze using traditional data processing methods. Big data often exhibits three main characteristics known as the "3V": volume (large), velocity (fast), and variety (diverse). This means data that is large in volume, arrives at high speed, and comes in various types and formats.