### Election Data Usecase
[Source](https://github.com/azurede007/data-repo/blob/main/election_data_2014.csv) <br/>
File contains below fields
- state: Name of the state where the constituency is located.
- constituency:  Name of the electoral constituency.
- candidate_name:  Name of the contesting candidate.
- sex:  Gender of the candidate (M/F).
- age:  Age of the candidate in years.
- category:  Candidate’s reservation category (e.g., ST, SC, General).
- partyname:  Name of the political party the candidate represents.
- partysymbol:  Symbol associated with the candidate’s political party.
- general:  Number of votes cast via general voting method.
- postal:  Number of votes cast via postal ballots.
- total:  Total votes received by the candidate (general + postal).
- pct_of_total_votes:  Candidate’s votes as a percentage of total registered voters.
- pct_of_polled_votes:  Candidate’s votes as a percentage of total votes actually cast.
- totalvoters:  Total number of registered voters in the constituency.

### Basic Usecases
- Read the election CSV file into a PySpark DataFrame
- Infer or define a custom schema for the dataset
- Display schema and preview data with printSchema() and show()
- Select specific columns (select) and rename them (withColumnRenamed)
- Filter rows based on conditions (e.g., candidates above a certain age)
- Drop unnecessary columns using drop()
- Handle missing values using na.fill() or na.drop()
- Cast columns to correct data types (e.g., age → Integer)
- Add a new column (e.g., total votes = general + postal)
- Sort candidates by total votes in descending order
- Count total number of candidates, parties, or constituencies
- Group by state or partyname and count candidates
- Show distinct party names, categories, or constituencies
- Create a temporary view and query the data using Spark SQL

### Advanced UseCase
- Add derived columns like vote_margin, turnout_percentage, and vote_share
- Validate data consistency (e.g., check if total = general + postal)
- Clean and standardize string columns (e.g., trim spaces in constituency)
- Calculate total votes by partyname per state
- Compute average age of candidates per party or constituency
- Calculate voter turnout percentage per constituency
- Find party-wise total votes and percentage share
- Identify winner per constituency
- Find top 3 candidates per constituency and calculate margin of victory
- Rank parties by total votes per state
- Create temp views and run SQL queries to analyze party performance
- Build an end-to-end ETL pipeline (ingest → transform → aggregate → load)
- Write aggregated results into Delta tables