In [104]:
import pandas as pd

Table of Contents 

# What will be covered

We will use the "categories" Dictionary from our project to learn about different dictionaries the uses of them. We will also convert them to data frames:
1. Regular Dictionary Structure:
    - Best for simple hierarchical data with straightforward access needs. It’s efficient and easy but may become cumbersome with complexity.
2. Nested Dictionary Structure:
    - Simple hierarchical data where the relationships are clearly defined and the dataset is manageable in size.
3. Use of a List of Dictionaries:
    - Suitable for flexible data management with easy integration into data processing tools. Offers greater flexibility but with some redundancy.
4. Using a Class-based Structure
    - Ideal for complex, dynamic data where encapsulation and behavior are important. It’s more maintainable but adds complexity.

***Choosing the right approach depends on the specific requirements of your project, including data complexity, scalability needs, and the tools you plan to use.***

## what is a Dictionary
* A Python dictionary is a built-in data structure. 
* It allows you to store and manage data in key-value pairs. 
* It is similar to a real-world dictionary where each key (word) maps to a value (definition).

### Key Features of Python Dictionaries
Key-Value Pairs:
- Key: A unique identifier used to access the corresponding value.
- Value: The data associated with the key. It can be of any data type.

Unordered:
- Dictionaries are unordered collections, meaning that the items do not have a specific order. The order in which items are inserted is not necessarily the order in which they are retrieved.

Mutable:
- You can change, add, or remove items from a dictionary after it has been created.

Unique Keys:
- Keys in a dictionary must be unique. If you try to insert a duplicate key, the new value will overwrite the old value associated with that key.



# 1. Regular Dictionary Structure

In [105]:
categories = {
    'nintendo': ['3ds', 'dsiw', 'dsi', 'ds', 'wii', 'wiiu', 'ns', 'gb', 'gba', 'nes', 'snes', 'gbc', 'n64', 'vb', 'gc', 'vc','ww'],
    'pc': ['linux', 'osx', 'pc', 'arc', 'all', 'fmt', 'c128', 'aco'],
    'xbox': ['x360', 'xone', 'series', 'xbl', 'xb', 'xs'],
    'sony': ['ps', 'ps2', 'ps3', 'ps4', 'ps5', 'psp', 'psv', 'psn', 'cdi'],
    'mobile': ['ios', 'and', 'winp', 'ngage', 'mob'],
    'sega': ['gg', 'msd', 'ms', 'gen', 'scd', 'sat', 's32x', 'dc'],
    'atari': ['2600', '7800', '5200', 'aj', 'int'],
    'commodore': ['amig', 'c64', 'cd32'],
    'other': ['ouya', 'or', 'acpc', 'ast', 'apii', 'pce', 'zxs', 'lynx', 'ng', 'zxs', '3do', 'pcfx', 'ws', 'brw', 'cv', 'giz', 'msx', 'tg16', 'bbcm']
}

converting to a data frame

In [106]:
categoriesList = []
for manufacturer, consoles in categories.items():
    for console in consoles:
        categoriesList.append({'manufacturer': manufacturer, 'console': console})

# Converting the list to a DataFrame
mfg_list = pd.DataFrame(categoriesList)

mfg_list

Unnamed: 0,manufacturer,console
0,nintendo,3ds
1,nintendo,dsiw
2,nintendo,dsi
3,nintendo,ds
4,nintendo,wii
...,...,...
75,other,cv
76,other,giz
77,other,msx
78,other,tg16


# 2. Nested Dictionary Structure
Instead of having a flat list of all items, you can maintain the nested dictionary structure. This approach allows you to access items by their category and sub-category.

In [107]:
categories = {
    'nintendo': {
        'handheld': ['3ds', 'dsiw', 'dsi', 'ds'],
        'console': ['wii', 'wiiu', 'ns', 'gb', 'gba', 'nes', 'snes', 'gbc', 'n64', 'vb', 'gc', 'vc', 'ww']
    },
    'pc': {
        'desktop': ['linux', 'osx', 'pc'],
        'arcade': ['arc', 'all', 'fmt', 'c128', 'aco']
    },
    'xbox': {
        'xbox360': ['x360'],
        'xboxone': ['xone'],
        'series': ['series'],
        'xboxlive': ['xbl'],
        'xboxoriginal': ['xb'],
        'xboxseries': ['xs']
    },
    'sony': {
        'playstation': ['ps', 'ps2', 'ps3', 'ps4', 'ps5'],
        'portable': ['psp', 'psv'],
        'network': ['psn'],
        'other': ['cdi']
    },
    'mobile': {
        'ios': ['ios'],
        'android': ['and'],
        'windows': ['winp'],
        'nokia': ['ngage'],
        'other': ['mob']
    },
    'sega': {
        'handheld': ['gg'],
        'home': ['msd', 'ms', 'gen', 'scd', 'sat', 's32x', 'dc']
    },
    'atari': {
        'classic': ['2600', '7800', '5200'],
        'jaguar': ['aj'],
        'interim': ['int']
    },
    'commodore': {
        'amiga': ['amig'],
        'c64': ['c64'],
        'cd32': ['cd32']
    },
    'other': {
        'ouya': ['ouya'],
        'various': ['or', 'acpc', 'ast', 'apii', 'pce', 'zxs', 'lynx', 'ng', 'zxs', '3do', 'pcfx', 'ws', 'brw', 'cv', 'giz', 'msx', 'tg16', 'bbcm']
    }
}

### Steps to Convert Nested Dictionary Structure to DataFrame
- Flatten the Nested Dictionary: Extract the data into a format that can be easily represented in a tabular form.
- Create DataFrames: Use the flattened data to create a DataFrame.

In [108]:
# Flatten the nested dictionary
data = []
for category_name, types in categories.items():
    for type_name, items in types.items():
        for item in items:
            data.append({'Category': category_name, 'Type': type_name, 'Item': item})

# Create DataFrame from the flattened data
df = pd.DataFrame(data)

# Display the DataFrame
df

Unnamed: 0,Category,Type,Item
0,nintendo,handheld,3ds
1,nintendo,handheld,dsiw
2,nintendo,handheld,dsi
3,nintendo,handheld,ds
4,nintendo,console,wii
...,...,...,...
75,other,various,cv
76,other,various,giz
77,other,various,msx
78,other,various,tg16


# 3. Use a List of Dictionaries

If you need to work with a flat structure but want to keep some contextual information, you could use a list of dictionaries. Each dictionary represents an item with additional metadata.

In [109]:
categories = [
    {'category': 'nintendo', 'type': 'handheld', 'items': ['3ds', 'dsiw', 'dsi', 'ds']},
    {'category': 'nintendo', 'type': 'console', 'items': ['wii', 'wiiu', 'ns', 'gb', 'gba', 'nes', 'snes', 'gbc', 'n64', 'vb', 'gc', 'vc', 'ww']},
    {'category': 'pc', 'type': 'desktop', 'items': ['linux', 'osx', 'pc']},
    {'category': 'pc', 'type': 'arcade', 'items': ['arc', 'all', 'fmt', 'c128', 'aco']},
    {'category': 'xbox', 'type': 'xbox360', 'items': ['x360']},
    {'category': 'xbox', 'type': 'xboxone', 'items': ['xone']},
    {'category': 'xbox', 'type': 'series', 'items': ['series']},
    {'category': 'xbox', 'type': 'xboxlive', 'items': ['xbl']},
    {'category': 'xbox', 'type': 'xboxoriginal', 'items': ['xb']},
    {'category': 'xbox', 'type': 'xboxseries', 'items': ['xs']},
    {'category': 'sony', 'type': 'playstation', 'items': ['ps', 'ps2', 'ps3', 'ps4', 'ps5']},
    {'category': 'sony', 'type': 'portable', 'items': ['psp', 'psv']},
    {'category': 'sony', 'type': 'network', 'items': ['psn']},
    {'category': 'sony', 'type': 'other', 'items': ['cdi']},
    {'category': 'mobile', 'type': 'ios', 'items': ['ios']},
    {'category': 'mobile', 'type': 'android', 'items': ['and']},
    {'category': 'mobile', 'type': 'windows', 'items': ['winp']},
    {'category': 'mobile', 'type': 'nokia', 'items': ['ngage']},
    {'category': 'mobile', 'type': 'other', 'items': ['mob']},
    {'category': 'sega', 'type': 'handheld', 'items': ['gg']},
    {'category': 'sega', 'type': 'home', 'items': ['msd', 'ms', 'gen', 'scd', 'sat', 's32x', 'dc']},
    {'category': 'atari', 'type': 'classic', 'items': ['2600', '7800', '5200']},
    {'category': 'atari', 'type': 'jaguar', 'items': ['aj']},
    {'category': 'atari', 'type': 'interim', 'items': ['int']},
    {'category': 'commodore', 'type': 'amiga', 'items': ['amig']},
    {'category': 'commodore', 'type': 'c64', 'items': ['c64']},
    {'category': 'commodore', 'type': 'cd32', 'items': ['cd32']},
    {'category': 'other', 'type': 'ouya', 'items': ['ouya']},
    {'category': 'other', 'type': 'various', 'items': ['or', 'acpc', 'ast', 'apii', 'pce', 'zxs', 'lynx', 'ng', 'zxs', '3do', 'pcfx', 'ws', 'brw', 'cv', 'giz', 'msx', 'tg16', 'bbcm']}
]

# Example usage
# for category in categories:
#     print(f"Category: {category['category']}, Type: {category['type']}")
#     print(f"Items: {', '.join(category['items'])}")
#     print()


To convert a list of dictionaries to a DataFrame: 
- You can directly use the `pd.DataFrame()` constructor from the Pandas library. 
- Each dictionary in the list represents a row in the DataFrame 
- The dictionary keys become the column names.

In [110]:
df = pd.DataFrame(categories)

# Display the DataFrame
df

Unnamed: 0,category,type,items
0,nintendo,handheld,"[3ds, dsiw, dsi, ds]"
1,nintendo,console,"[wii, wiiu, ns, gb, gba, nes, snes, gbc, n64, ..."
2,pc,desktop,"[linux, osx, pc]"
3,pc,arcade,"[arc, all, fmt, c128, aco]"
4,xbox,xbox360,[x360]
5,xbox,xboxone,[xone]
6,xbox,series,[series]
7,xbox,xboxlive,[xbl]
8,xbox,xboxoriginal,[xb]
9,xbox,xboxseries,[xs]


Exploding the list in items

In [111]:
# Explode the 'items' list into separate rows
df_exploded = df.explode('items')
df_exploded

Unnamed: 0,category,type,items
0,nintendo,handheld,3ds
0,nintendo,handheld,dsiw
0,nintendo,handheld,dsi
0,nintendo,handheld,ds
1,nintendo,console,wii
...,...,...,...
28,other,various,cv
28,other,various,giz
28,other,various,msx
28,other,various,tg16


### Explanation

- Convert to DataFrame: Use `pd.DataFrame(categories)` to convert the list of dictionaries to a DataFrame. Each dictionary in the list is treated as a row, and the keys of the dictionaries are used as column names.

- Display the DataFrame: Use `print(df)` to display the DataFrame.

# 4. Use a Class-based Structure
For more complex cases, you might use classes to represent categories and items, allowing more control and structure.

In [112]:
class Category:
    def __init__(self, name):
        self.name = name
        self.items = {}

    def add_items(self, type_name, items):
        self.items[type_name] = items

    def __repr__(self):
        return f"Category(name={self.name}, items={self.items})"


class CategoryManager:
    def __init__(self):
        self.categories = {}

    def add_category(self, name):
        if name not in self.categories:
            self.categories[name] = Category(name)
        else:
            print(f"Category '{name}' already exists.")

    def add_items_to_category(self, category_name, type_name, items):
        if category_name in self.categories:
            self.categories[category_name].add_items(type_name, items)
        else:
            print(f"Category '{category_name}' does not exist.")

    def __repr__(self):
        return "\n".join(str(cat) for cat in self.categories.values())


# Initialize CategoryManager
manager = CategoryManager()

# Add categories and items
manager.add_category('nintendo')
manager.add_items_to_category('nintendo', 'handheld', ['3ds', 'dsiw', 'dsi', 'ds'])
manager.add_items_to_category('nintendo', 'console', ['wii', 'wiiu', 'ns', 'gb', 'gba', 'nes', 'snes', 'gbc', 'n64', 'vb', 'gc', 'vc', 'ww'])

manager.add_category('pc')
manager.add_items_to_category('pc', 'desktop', ['linux', 'osx', 'pc'])
manager.add_items_to_category('pc', 'arcade', ['arc', 'all', 'fmt', 'c128', 'aco'])

manager.add_category('xbox')
manager.add_items_to_category('xbox', 'xbox360', ['x360'])
manager.add_items_to_category('xbox', 'xboxone', ['xone'])
manager.add_items_to_category('xbox', 'series', ['series'])
manager.add_items_to_category('xbox', 'xboxlive', ['xbl'])
manager.add_items_to_category('xbox', 'xboxoriginal', ['xb'])
manager.add_items_to_category('xbox', 'xboxseries', ['xs'])

manager.add_category('sony')
manager.add_items_to_category('sony', 'playstation', ['ps', 'ps2', 'ps3', 'ps4', 'ps5'])
manager.add_items_to_category('sony', 'portable', ['psp', 'psv'])
manager.add_items_to_category('sony', 'network', ['psn'])
manager.add_items_to_category('sony', 'other', ['cdi'])

manager.add_category('mobile')
manager.add_items_to_category('mobile', 'ios', ['ios'])
manager.add_items_to_category('mobile', 'android', ['and'])
manager.add_items_to_category('mobile', 'windows', ['winp'])
manager.add_items_to_category('mobile', 'nokia', ['ngage'])
manager.add_items_to_category('mobile', 'other', ['mob'])

manager.add_category('sega')
manager.add_items_to_category('sega', 'handheld', ['gg'])
manager.add_items_to_category('sega', 'home', ['msd', 'ms', 'gen', 'scd', 'sat', 's32x', 'dc'])

manager.add_category('atari')
manager.add_items_to_category('atari', 'classic', ['2600', '7800', '5200'])
manager.add_items_to_category('atari', 'jaguar', ['aj'])
manager.add_items_to_category('atari', 'interim', ['int'])

manager.add_category('commodore')
manager.add_items_to_category('commodore', 'amiga', ['amig'])
manager.add_items_to_category('commodore', 'c64', ['c64'])
manager.add_items_to_category('commodore', 'cd32', ['cd32'])

manager.add_category('other')
manager.add_items_to_category('other', 'ouya', ['ouya'])
manager.add_items_to_category('other', 'various', ['or', 'acpc', 'ast', 'apii', 'pce', 'zxs', 'lynx', 'ng', 'zxs', '3do', 'pcfx', 'ws', 'brw', 'cv', 'giz', 'msx', 'tg16', 'bbcm'])

# Example usage: print all categories
# print(manager)


### Explanation
-  **`Category` Class:** Represents a single category with a name and a dictionary of item types and their respective items.

- **`add_items()`:** Adds items to a specific type within the category.

- **`__repr__()`:** Provides a string representation of the category.

- **`CategoryManager` Class:** Manages multiple categories and provides methods to add new categories and items.

- **`add_category()`:** Adds a new category if it doesn’t already exist.

- **`add_items_to_category()`:** Adds items to an existing category.

- **`__repr__()`:**  Returns a string representation of all categories.


To convert the class-based structure to a DataFrame, you'll need to extract data from the CategoryManager and Category classes into a format suitable for tabular representation.

So we would have to modify the CategoryManager call to do so. 

We need to add the following function to the class: 

```python 
def to_dataframe(self):
        # Flatten the categories into a list of dictionaries
        data = []
        for cat_name, category in self.categories.items():
            for type_name, items in category.items.items():
                for item in items:
                    data.append({'Category': cat_name, 'Type': type_name, 'Item': item})

        # Create a DataFrame from the list of dictionaries
        df = pd.DataFrame(data)
        return df 
```


In [113]:
class CategoryManager:
    def __init__(self):
        self.categories = {}

    def add_category(self, name):
        if name not in self.categories:
            self.categories[name] = Category(name)
        else:
            print(f"Category '{name}' already exists.")

    def add_items_to_category(self, category_name, type_name, items):
        if category_name in self.categories:
            self.categories[category_name].add_items(type_name, items)
        else:
            print(f"Category '{category_name}' does not exist.")

    def __repr__(self):
        return "\n".join(str(cat) for cat in self.categories.values())

    def to_dataframe(self):
        # Flatten the categories into a list of dictionaries
        data = []
        for cat_name, category in self.categories.items():
            for type_name, items in category.items.items():
                for item in items:
                    data.append({'Category': cat_name, 'Type': type_name, 'Item': item})

        # Create a DataFrame from the list of dictionaries
        df = pd.DataFrame(data)
        return df

### Explanation

**Flattening the Data:**
- The to_dataframe method in the CategoryManager class iterates over each Category instance in the categories dictionary.
- For each category, it iterates over its types and their associated items.
- Each item is then appended to a list of dictionaries with keys `Category`, `Type`, and `Item`.

**Creating the DataFrame:**
- Convert the list of dictionaries to a Pandas DataFrame using pd.DataFrame(data).

**Display the DataFrame:**
 - Print the DataFrame to view the tabular representation of the hierarchical data.


In [114]:
class Category:
    def __init__(self, name):
        self.name = name
        self.items = {}

    def add_items(self, type_name, items):
        self.items[type_name] = items

    def __repr__(self):
        return f"Category(name={self.name}, items={self.items})"


class CategoryManager:
    def __init__(self):
        self.categories = {}

    def add_category(self, name):
        if name not in self.categories:
            self.categories[name] = Category(name)
        else:
            print(f"Category '{name}' already exists.")

    def add_items_to_category(self, category_name, type_name, items):
        if category_name in self.categories:
            self.categories[category_name].add_items(type_name, items)
        else:
            print(f"Category '{category_name}' does not exist.")

    def __repr__(self):
        return "\n".join(str(cat) for cat in self.categories.values())

    def to_dataframe(self):
        # Flatten the categories into a list of dictionaries
        data = []
        for cat_name, category in self.categories.items():
            for type_name, items in category.items.items():
                for item in items:
                    data.append({'Category': cat_name, 'Type': type_name, 'Item': item})

        # Create a DataFrame from the list of dictionaries
        df = pd.DataFrame(data)
        return df


# Initialize CategoryManager
manager = CategoryManager()

# Add categories and items
manager.add_category('nintendo')
manager.add_items_to_category('nintendo', 'handheld', ['3ds', 'dsiw', 'dsi', 'ds'])
manager.add_items_to_category('nintendo', 'console', ['wii', 'wiiu', 'ns', 'gb', 'gba', 'nes', 'snes', 'gbc', 'n64', 'vb', 'gc', 'vc', 'ww'])

manager.add_category('pc')
manager.add_items_to_category('pc', 'desktop', ['linux', 'osx', 'pc'])
manager.add_items_to_category('pc', 'arcade', ['arc', 'all', 'fmt', 'c128', 'aco'])

manager.add_category('xbox')
manager.add_items_to_category('xbox', 'xbox360', ['x360'])
manager.add_items_to_category('xbox', 'xboxone', ['xone'])
manager.add_items_to_category('xbox', 'series', ['series'])
manager.add_items_to_category('xbox', 'xboxlive', ['xbl'])
manager.add_items_to_category('xbox', 'xboxoriginal', ['xb'])
manager.add_items_to_category('xbox', 'xboxseries', ['xs'])

manager.add_category('sony')
manager.add_items_to_category('sony', 'playstation', ['ps', 'ps2', 'ps3', 'ps4', 'ps5'])
manager.add_items_to_category('sony', 'portable', ['psp', 'psv'])
manager.add_items_to_category('sony', 'network', ['psn'])
manager.add_items_to_category('sony', 'other', ['cdi'])

manager.add_category('mobile')
manager.add_items_to_category('mobile', 'ios', ['ios'])
manager.add_items_to_category('mobile', 'android', ['and'])
manager.add_items_to_category('mobile', 'windows', ['winp'])
manager.add_items_to_category('mobile', 'nokia', ['ngage'])
manager.add_items_to_category('mobile', 'other', ['mob'])

manager.add_category('sega')
manager.add_items_to_category('sega', 'handheld', ['gg'])
manager.add_items_to_category('sega', 'home', ['msd', 'ms', 'gen', 'scd', 'sat', 's32x', 'dc'])

manager.add_category('atari')
manager.add_items_to_category('atari', 'classic', ['2600', '7800', '5200'])
manager.add_items_to_category('atari', 'jaguar', ['aj'])
manager.add_items_to_category('atari', 'interim', ['int'])

manager.add_category('commodore')
manager.add_items_to_category('commodore', 'amiga', ['amig'])
manager.add_items_to_category('commodore', 'c64', ['c64'])
manager.add_items_to_category('commodore', 'cd32', ['cd32'])

manager.add_category('other')
manager.add_items_to_category('other', 'ouya', ['ouya'])
manager.add_items_to_category('other', 'various', ['or', 'acpc', 'ast', 'apii', 'pce', 'zxs', 'lynx', 'ng', 'zxs', '3do', 'pcfx', 'ws', 'brw', 'cv', 'giz', 'msx', 'tg16', 'bbcm'])

# Convert to DataFrame
df = manager.to_dataframe()

# Display the DataFrame
df


Unnamed: 0,Category,Type,Item
0,nintendo,handheld,3ds
1,nintendo,handheld,dsiw
2,nintendo,handheld,dsi
3,nintendo,handheld,ds
4,nintendo,console,wii
...,...,...,...
75,other,various,cv
76,other,various,giz
77,other,various,msx
78,other,various,tg16


# Recap

### 1. Regular Dictionary Structure
**Pros:**

- Simplicity: Easy to understand and implement for straightforward hierarchical data.
Direct Access: Efficient for quick lookups and modifications.
Compact Representation: Minimal code needed for defining and managing the data.

**Cons:**

- Scalability Issues: Can become unwieldy with more complex or larger datasets.
Limited Metadata: Difficult to include additional metadata or complex relationships beyond basic key-value pairs.
Manual Processing: Requires extra code to perform operations such as filtering or converting to other formats (like DataFrames).


### 2. Nested Dictionary Structure
**Pros:**

- Simplicity: Easy to understand and use for straightforward hierarchical data.
- Direct Access: Quick to access and modify data using dictionary keys.
- Compact Representation: Minimal code needed for defining and populating data.

**Cons:**

- Scalability Issues: As the complexity of data increases, maintaining and querying the dictionary can become cumbersome.
- Lack of Structure: Limited in expressing complex relationships or metadata beyond simple key-value pairs.
- Manual Conversion: Requires additional code to convert into other formats like DataFrames for advanced analysis or manipulation.

### 3. List of Dictionaries
**Pros:**

- Flexibility: Easy to represent complex structures with additional metadata.
- Compatibility: Directly compatible with many data processing tools and libraries, such as Pandas DataFrames.
- Simplicity in Processing: Easy to filter, sort, and manipulate using built-in methods or libraries.

**Cons:**

- Redundancy: Potential for duplicate data if not managed carefully, especially with repeated structures.
- Overhead: Slightly more verbose than a nested dictionary structure, with extra steps for creating and managing data.
- No Hierarchical Data: Less intuitive for representing hierarchical relationships compared to nested dictionaries or class-based structures.

### 4. Class-based Structure
**Pros:**

- Encapsulation: Encapsulates behavior and data, making it easy to manage and extend.
- Readability and Maintainability: Clear structure and separation of concerns, with methods for modifying and accessing data.
- Extensibility: Easy to add new features or modify existing ones without changing the overall structure.

**Cons:**

- Complexity: More complex to implement and understand compared to simpler data structures like dictionaries or lists.
- Overhead: Requires additional code to define classes and methods, which might be overkill for simpler use cases.
- Performance: Slightly more overhead in terms of memory and processing compared to simpler data structures.

### 5. DataFrame
**Pros:**

- Powerful Tools: Leverages Pandas' extensive functionality for data analysis, manipulation, and visualization.
- Tabular Representation: Naturally suited for tabular data, with built-in support for complex operations.
- Integration: Easily integrates with other data analysis tools and libraries.

**Cons:**

- Overhead: Requires additional libraries (like Pandas) and can introduce complexity if you only need basic data management.
- Learning Curve: May require additional learning and understanding of DataFrame operations and Pandas library.
- Performance: Handling very large DataFrames may require significant memory and processing power.

### Summary

- Nested Dictionary Structure: Best for simple, static data with straightforward access needs.
- List of Dictionaries: Suitable for flexible data manipulation and integration with data processing tools.
- Class-based Structure: Ideal for complex data with additional behaviors and encapsulation needs.
- DataFrame: Best for advanced data analysis, manipulation, and visualization, especially with large datasets.

***Each approach has its own strengths and is best suited for different scenarios. The choice depends on factors like data complexity, processing needs, and familiarity with the tools.***