<a href="https://colab.research.google.com/github/animesh-11/AI_ML/blob/main/EDA_Grade_Ques.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

You are a data analyst at a research institute studying the political ideologies of a group of citizens, with a particular focus on conservatism, capitalism, and environmentalism.


The common perception is that conservatism and capitalism are ideologies that prioritise economic growth, individual liberty, and limited government intervention. In contrast, environmentalism is often viewed as conflicting with these ideologies, as it generally advocates for more government regulation to protect the environment, which may clash with the free-market principles of capitalism. As a result, individuals who strongly identify with both conservatism and capitalism are typically seen as less likely to support environmentalism.


To analyse this further, you need to write a Python function conditional_probability_of_support(selected_individuals) that takes a list of dictionaries as input, wherein each dictionary represents an individual's political beliefs, where the keys correspond to the ideologies and the values are booleans indicating whether the individual identifies with or supports the respective ideology, and calculates the conditional probability that a randomly selected individual who identifies with both conservatism and capitalism also supports environmentalism.


Input format

A list of dictionaries representing the political beliefs of individuals. Each dictionary contains:

Three keys: 'conservatism', 'capitalism', and 'environmentalism'
The values (bool) for these keys, where True indicates support for the respective ideology and False indicates no support for the respective ideology

Output format

The conditional probability (float) that a randomly selected individual who identifies with both conservatism and capitalism also supports environmentalism

If no individuals identify with both conservatism and capitalism, the function should return 0.0

Constraints

The input list contains at least one individual
The computed probability is rounded to four decimals

Example Case 1

Input


[{'conservatism': True, 'capitalism': False, 'environmentalism': True}]


Output

0.0


Example Case 2

Input

[{'conservatism': True, 'capitalism': True, 'environmentalism': True}, {'conservatism': True, 'capitalism': True, 'environmentalism': False}, {'conservatism': True, 'capitalism': True, 'environmentalism': False}]


Output

0.3333

In [3]:
def conditional_probability_of_support(selected_individuals):
    """
    Calculates the conditional probability that a randomly selected individual
    who identifies with both conservatism and capitalism also supports environmentalism.

    Args:
        selected_individuals (list): A list of dictionaries, where each dictionary
                                     represents an individual's political beliefs.
                                     Keys are 'conservatism', 'capitalism',
                                     'environmentalism', with boolean values.

    Returns:
        float: The conditional probability, rounded to four decimal places.
               Returns 0.0 if no individuals identify with both conservatism and capitalism.
    """
    count_conserv_cap = 0
    count_conserv_cap_env = 0

    for individual in selected_individuals:
        if individual['conservatism'] and individual['capitalism']:
            count_conserv_cap += 1
            if individual['environmentalism']:
                count_conserv_cap_env += 1

    if count_conserv_cap == 0:
        return 0.0

    probability = count_conserv_cap_env / count_conserv_cap
    return round(probability, 4)

from ast import literal_eval
print(round(conditional_probability_of_support(literal_eval(input())), 4))

[{'conservatism': True, 'capitalism': False, 'environmentalism': True}]
0.0


You are working as a data analyst for a public election commission. Your task is to help ensure the integrity of election data by removing invalid votes from the provided voting dataset.


You are given a data set with 20 rows and 3 columns that containing voter records for a recent local election. Each row represents a single vote submission. Your task is to implement a function remove_illegal_votes(df) that takes an input data frame and cleans it by identifying and removing invalid votes based on predefined rules.


The function should examine the dataset for any entries that violate the voting rules, such as voters casting multiple votes, voting for both candidates, or other discrepancies. Once the illegal votes are identified, the function should remove these rows from the data frame and return the cleaned data frame that only contains valid votes.


Note: that a vote is represented by a 1, and not voting is defined by a 0 in the data


Data Description

The dataset contains the following columns:


'voter_id': A unique identifier (int) for each voter, for example, 101, 102, 103, ..., 118
'candidate_A': Whether the voter voted for candidate A (int)
'candidate_B': Whether the voter voted for candidate B (int)

Vote Validation Rules

A vote is considered illegal if it satisfies any of the following conditions:


The voter has voted for both candidates in a single row
The voter ID appears in multiple rows and has voted at least once across those rows

Note: A voter ID that appears multiple times but did not vote at all is considered valid, but only one row should be retained for that voter


Input Format

A tuple containing sequentially

row_index (int): a row number from 0 to 11 (both inclusive)
candidate_name (str): either 'candidate_A' or 'candidate_B'

Output Format

A value (int) representing the vote

Constraints

There are no corrupt or null values in the data
There are no data type inconsistencies in the data
There is no ordering or rearrangement of the rows or columns of the data frame

Example Case 1

Input

(0, 1)


Output

1


Example Case 2


Input


(1, 2)


Output

1

In [7]:
import pandas as pd
from ast import literal_eval

def remove_illegal_votes(df: pd.DataFrame) -> pd.DataFrame:
    """
    Cleans a DataFrame of voter records by removing illegal votes based on predefined rules.

    Args:
        df (pd.DataFrame): The input DataFrame containing voter records with
                           'voter_id', 'candidate_A', and 'candidate_B' columns.

    Returns:
        pd.DataFrame: A new DataFrame with illegal votes removed, containing only valid votes.
    """
    df_cleaned = df.copy()

    # Rule 1: A vote is considered illegal if the voter has voted for both candidates in a single row
    illegal_votes_both_candidates_mask = (df_cleaned['candidate_A'] == 1) & (df_cleaned['candidate_B'] == 1)
    df_cleaned = df_cleaned[~illegal_votes_both_candidates_mask]

    # --- Prepare for Rule 2 & 3: Handle duplicate voter IDs ---

    # Identify all voter IDs that appear more than once in the DataFrame after Rule 1.
    duplicate_voter_ids_post_rule1 = df_cleaned['voter_id'][df_cleaned['voter_id'].duplicated(keep=False)].unique()

    # For the remaining DataFrame, group by voter_id and determine which voter_ids
    # have cast at least one vote (candidate_A=1 or candidate_B=1 in any of their rows).
    voted_status_per_voter = df_cleaned.groupby('voter_id')[['candidate_A', 'candidate_B']].any()
    voted_at_all_true_voters = voted_status_per_voter[voted_status_per_voter['candidate_A'] | voted_status_per_voter['candidate_B']].index

    # Identify voter_ids that are duplicated (post Rule 1) AND have voted at least once.
    voter_ids_to_remove_all_entries = set(duplicate_voter_ids_post_rule1).intersection(set(voted_at_all_true_voters))

    # Filter out all rows from the DataFrame that belong to the identified voter IDs.
    df_cleaned = df_cleaned[~df_cleaned['voter_id'].isin(voter_ids_to_remove_all_entries)]

    # For any remaining voter_ids that still appear multiple times (meaning they never voted),
    # use drop_duplicates to keep only the first occurrence (Rule 3).
    df_cleaned = df_cleaned.drop_duplicates(subset=['voter_id'], keep='first')

    # Reset the index of the resulting DataFrame.
    return df_cleaned.reset_index(drop=True)

# Input and output processing
filename = 'https://d3ejq4mxgimsmf.cloudfront.net/votes-5d1c9ce4560d4bcbace2742f26ca9c24.csv'
df = pd.read_csv(filename)
x, y = literal_eval(input())
print(int(remove_illegal_votes(df).iloc[x, y]))

(1, 2)
1
