## 50 Core Technical Questionnaires with their answers that you should know before doing the Mock Interviews

### Interview Questions based on the projects mentioned in the resume

### 1. Here are some tailored interview questions and answers based on a project where I built a comprehensive revenue forecasting model using Python and scikit-learn:

### Project Understanding and Overview

#### 1. Can you explain the objective of your revenue forecasting project?
#### 	•	Answer:
#### 	•	The objective was to build a machine learning model that accurately predicts future revenue based on historical data, seasonal trends, and other influential factors. This helps businesses optimize resource allocation, inventory management, and strategic planning.


#### 2. What were the key steps in building the forecasting model?
#### 	•	Answer:
#### 	1.	Understanding the problem and gathering requirements.
#### 	2.	Collecting and preprocessing historical data.
#### 	3.	Performing exploratory data analysis (EDA) to uncover trends, seasonality, and anomalies.
#### 	4.	Feature engineering to derive meaningful predictors (e.g., lag features, moving averages).
#### 	5.	Splitting data into training, validation, and test sets.
#### 	6.	Selecting and tuning machine learning models (e.g., linear regression, decision trees, or ensemble methods).
#### 	7.	Evaluating model performance using metrics like MAE, RMSE, and MAPE.
#### 	8.	Deploying the model for real-time or batch predictions.

### Data Handling and Preprocessing


#### 3. What kind of data did you use for revenue forecasting?
#### 	•	Answer:
#### 	•	Historical revenue data.
#### 	•	Time-related features (e.g., day of the week, month, year).
#### 	•	External variables (e.g., economic indicators, holidays, weather conditions).
#### 	•	Derived features like lag revenue, moving averages, and growth rates.

#### 4. How did you handle missing data in your dataset?
#### 	•	Answer:
#### 	•	Depending on the context:
#### 	•	Imputed missing values with mean/median for numerical features or mode for categorical features.
#### 	•	Used forward-fill/backward-fill for time-series data.
#### 	•	Dropped records if the percentage of missing data was significant and imputation wasn’t feasible.

#### 5. What feature engineering techniques did you apply?
#### 	•	Answer:
#### 	•	Created lag features to capture revenue from previous periods.
#### 	•	Generated rolling statistics (e.g., moving averages, rolling standard deviation).
#### 	•	Encoded time-related features like day of the week, month, and quarter.
#### 	•	Derived external factors such as holiday indicators and promotion effects.
#### 	•	Scaled numerical features using StandardScaler or MinMaxScaler for models sensitive to scale.

### Model Selection and Evaluation


#### 6. Which machine learning algorithms did you try, and why?
#### 	•	Answer:
#### 	•	Linear Regression: To establish a baseline and for its interpretability.
#### 	•	Random Forest/Gradient Boosting (e.g., XGBoost, LightGBM): To handle non-linear relationships and interactions between features.

	
   

#### 7. What metrics did you use to evaluate model performance, and why?
#### 	•	Answer:
#### 	•	Mean Absolute Error (MAE): Easy to interpret and robust to outliers.
#### 	•	Root Mean Squared Error (RMSE): Penalizes large errors more heavily, useful for emphasizing precision.
#### 	•	Mean Absolute Percentage Error (MAPE): Provides a percentage error for interpretability.
#### 	•	Used cross-validation and backtesting on time-series data to ensure robustness.

#### 8. How did you handle overfitting in your model?
#### 	•	Answer:
#### 	•	Used techniques such as:
#### 	•	Regularization (e.g., Ridge, Lasso).
#### 	•	Pruning (for tree-based models).
#### 	•	Cross-validation to ensure generalization.
#### 	•	Feature selection to remove irrelevant features.
#### 	•	Tuning hyperparameters.

#### 9. Did you encounter any challenges while working on this project? How did you resolve them?
#### 	•	Answer:
#### 	•	Challenge 1: Insufficient data for specific periods.
#### Solution: Augmented data with external datasets or used interpolation techniques.
#### 	•	Challenge 2: High variance in revenue data.
#### Solution: Applied smoothing techniques and used robust algorithms like gradient boosting.
#### 	•	Challenge 3: Model drift in real-time predictions.
#### Solution: Monitored model performance and implemented periodic retraining.

### Deployment and Business Impact

#### 10. What was the business impact of your model?
#### 	•	Answer:
#### 	•	Improved revenue prediction accuracy by 15%.
#### 	•	Enabled better inventory and resource management, reducing costs by 20%.
#### 	•	Enhanced decision-making for promotions and pricing strategies.

### General ETL Process Questions


#### 11. What does ETL stand for, and how did you implement it in your project?
#### 	•	Answer:
#### 	•	ETL stands for Extract, Transform, and Load:
#### 	1.	Extract: Loaded data from various sources like CSV files, databases, and APIs using libraries like pandas.read_csv().
#### 	2.	Transform: Cleansed, aggregated, and transformed data using pandas operations such as .groupby(), .merge(), and .apply().
#### 	3.	Load: Saved the processed data back to databases or files using to_csv() or database connectors.

#### 12. How did you handle data extraction from multiple sources?
#### 	•	Answer:
#### 	•	Connected to relational databases using SQLAlchemy.
#### 	•	Extracted data from REST APIs using requests library and processed JSON responses.
#### 	•	Used pandas functions like read_csv(), read_excel(), or read_sql_query() for file-based and SQL-based extractions.


#### 13. How do you ensure data consistency during extraction?
#### 	•	Answer:
#### 	•	Validated source data formats using pandas validation techniques (e.g., .dtypes checks).
#### 	•	Handled schema differences between sources by normalizing column names and data types.
#### 	•	Logged extraction errors for records that couldn’t be processed, ensuring they could be reviewed and reprocessed.

#### 14. What transformations did you perform on the raw data?
#### 	•	Answer:
#### 	•	Removed duplicates using drop_duplicates().
#### 	•	Handled missing data by:
#### 	•	Filling with mean/median/mode using fillna().
#### 	•	Dropping rows/columns using dropna().
#### 	•	Normalized and standardized data with pandas and sklearn.preprocessing.
#### 	•	Merged datasets using merge() and joined tables with common keys.
#### 	•	Aggregated data using .groupby() to compute metrics like sum, average, and counts.

#### 15. How did you ensure data quality during transformation?
#### 	•	Answer:
#### 	•	Applied data validation rules like checking for outliers and setting constraints for acceptable ranges.
#### 	•	Used assertions in Python to enforce rules, e.g., assert df['value'].notnull().all().
#### 	•	Applied regular expressions (str.contains() or str.match()) to validate string patterns like email or phone formats.

#### 16. What steps did you take to optimize the data loading process?
#### 	•	Answer:
#### 	•	Used bulk inserts when loading data into databases to reduce transaction overhead.
#### 	•	Compressed large datasets into .parquet format for efficient storage and retrieval.
#### 	•	Ensured transactional integrity by committing or rolling back changes in case of failures.

#### 17. How did you load the cleansed data into the destination?
#### 	•	Answer:
#### 	•	Used to_csv() and to_parquet() for saving data to files.
#### 	•	Used to_sql() with SQLAlchemy to insert data into relational databases.
#### 	•	Scheduled batch processing and automated loading with tools like Apache Airflow or cron jobs.

#### 18. How did you handle performance issues with pandas while processing large datasets?
#### 	•	Answer:
#### 	•	Leveraged pandas chunking (chunksize) to process data in smaller portions.
####    •   Used parallel processing libraries like dask or modin for distributed data processing.
#### 	•	Reduced memory footprint by explicitly specifying data types (pd.to_numeric() or .astype()).
#### 	•	Replaced loops with vectorized operations and used .applymap() instead of .apply() when applying element-wise functions.

### Interactive Dashboard Tableau Questions

#### 19. What are the key steps involved in creating an interactive Tableau dashboard?
#### 	•	Answer:
#### 	1.	Connect to a data source and import data into Tableau.
#### 	2.	Prepare and clean the data using Tableau Prep or in Tableau Desktop.
#### 	3.	Create individual visualizations (e.g., charts, maps).
#### 	4.	Combine visualizations into a dashboard.
#### 	5.	Add interactivity with filters, actions, and parameters.
#### 	6.	Customize the layout for user experience (e.g., responsive design).
#### 	7.	Publish the dashboard to Tableau Server or Tableau Online for sharing.

#### 20. What types of filters can you use in a Tableau dashboard?
#### 	•	Answer:
#### 	•	Quick Filters: Allow users to filter views interactively.
#### 	•	Global Filters: Apply filters across multiple worksheets on the dashboard.
#### 	•	Context Filters: Restrict the data shown in other filters.
#### 	•	Extract Filters: Filter data while creating an extract to improve performance.


#### 21. How do you add interactivity to a Tableau dashboard?
#### 	•	Answer:
#### 	•	Use filters to allow users to narrow down data.
#### 	•	Add dashboard actions like:
#### 	•	Filter Actions: Dynamically filter data across sheets.
#### 	•	Highlight Actions: Highlight related data in other views.
#### 	•	URL Actions: Link to external web pages.
#### 	•	Use parameters to let users dynamically change input values.
#### 	•	Enable drill-downs by organizing data hierarchically (e.g., Category → Subcategory).


#### 22. How do you ensure your Tableau dashboard is user-friendly?
#### 	•	Answer:
#### 	•	Keep the layout simple and intuitive with a clear visual hierarchy.
#### 	•	Use consistent color schemes and labels.
#### 	•	Include tooltips and legends for better context.
#### 	•	Add titles and descriptions to guide users.
#### 	•	Test the dashboard with end-users to gather feedback on usability.

### SQL Questions

#### 23. What is SQL, and what are its main types of commands?
#### 	•	Answer:
#### SQL (Structured Query Language) is used to communicate with and manage databases. It includes:
#### 	•	DDL (Data Definition Language): CREATE, ALTER, DROP
#### 	•	DML (Data Manipulation Language): INSERT, UPDATE, DELETE
#### 	•	DQL (Data Query Language): SELECT
#### 	•	TCL (Transaction Control Language): COMMIT, ROLLBACK, SAVEPOINT
#### 	•	DCL (Data Control Language): GRANT, REVOKE

#### 24. How do you retrieve unique values from a column?
#### 	•	Query:
        SELECT DISTINCT column_name
        FROM table_name;

#### 25. How do you write an SQL query to find the second highest salary?
#### 	•	Query:
             SELECT MAX(salary) AS second_highest_salary
             FROM employees
             WHERE salary < (SELECT MAX(salary) FROM employees);
             

#### 26. Explain the difference between INNER JOIN, LEFT JOIN, and RIGHT JOIN.
#### 	•	Answer:
#### 	•	INNER JOIN: Returns rows with matching values in both tables.
#### 	•	LEFT JOIN: Returns all rows from the left table and matching rows from the right table.
#### 	•	RIGHT JOIN: Returns all rows from the right table and matching rows from the left table.

#### 27. What is a subquery, and how is it used?
#### 	•	Answer:
#### A subquery is a query within another query, often used to filter results.
#### 	•	Example:
              SELECT employee_name
              FROM employees
              WHERE department_id = (
                  SELECT department_id
                  FROM departments
                  WHERE department_name = 'IT'
                  );

#### 28. Explain window functions in SQL.
#### 	•	Answer:
#### Window functions perform calculations across rows related to the current row without collapsing data.
#### 	•	Example:
                    SELECT employee_name, department_id, salary,
                           SUM(salary) OVER (PARTITION BY department_id) AS department_total
                    FROM employees;

#### 29. Explain the difference between UNION and UNION ALL.
#### 	•	Answer:
#### 	•	UNION: Combines results from two queries and removes duplicates.
#### 	•	UNION ALL: Combines results and includes duplicates.
#### 	•	Example:
               -- Using UNION
              SELECT employee_name FROM employees_2023
              UNION
              SELECT employee_name FROM employees_2022;

              -- Using UNION ALL
             SELECT employee_name FROM employees_2023
             UNION ALL
             SELECT employee_name FROM employees_2022;

#### Basic MySQL Questions

#### 30. What is MySQL?
#### 	•	Answer:
#### MySQL is an open-source relational database management system (RDBMS) based on Structured Query Language (SQL). It is widely used for web applications and supports various operating systems.

#### 31. What are the different storage engines in MySQL?
#### 	•	Answer:
#### MySQL supports several storage engines, including:
#### 	•	InnoDB: Default engine, supports transactions and foreign keys.
#### 	•	MyISAM: Fast for read-heavy workloads but lacks transactions.
#### 	•	Memory: Stores data in RAM for faster access.
#### 	•	CSV: Stores data in plain text files in CSV format.
#### 	•	Archive: Optimized for high data compression and storage.

#### 32. What are MySQL’s constraints?
#### 	•	Answer:
#### Constraints are rules enforced on table columns to ensure data integrity:
#### 	•	NOT NULL: Prevents null values.
#### 	•	UNIQUE: Ensures unique values in a column.
#### 	•	PRIMARY KEY: Combines NOT NULL and UNIQUE.
#### 	•	FOREIGN KEY: Establishes relationships between tables.
#### 	•	CHECK: Ensures values meet specific conditions.
#### 	•	DEFAULT: Assigns default values to a column.

### Regression Analysis Questions

#### 33. What is regression analysis?
#### 	•	Answer:
#### Regression analysis is a statistical method used to determine the relationship between a dependent variable (target) and one or more independent variables (predictors). It is widely used for prediction, forecasting, and understanding the relationships in data.

#### 34. What is the difference between linear regression and logistic regression?
#### 	•	Answer:
#### 	•	Linear Regression:
#### 	•	Predicts continuous outcomes.
#### 	•	Output: Real numbers (e.g., sales, temperature).
#### 	•	Equation:  Y = \beta_0 + \beta_1X + \epsilon .
#### 	•	Logistic Regression:
#### 	•	Predicts binary or categorical outcomes.
#### 	•	Output: Probabilities (e.g., spam/not spam).
#### 	•	Equation:  P(Y=1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1X)}} .

#### 35. How do you evaluate the performance of a regression model?
#### 	•	Answer:
#### Common metrics include:
#### 	•	Mean Absolute Error (MAE):  \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| 
#### 	•	Mean Squared Error (MSE):  \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 
#### 	•	Root Mean Squared Error (RMSE):  \sqrt{MSE} 
#### 	•	R-Squared (Coefficient of Determination): Proportion of variance explained by the model.
#### 	•	Adjusted R-Squared: Adjusted for the number of predictors in the model.

#### 36. What is regularization in regression, and why is it important?
#### 	•	Answer:
#### Regularization adds a penalty term to the loss function to reduce overfitting and improve generalization:
#### 	•	L1 Regularization (Lasso): Shrinks coefficients and can set some to zero, effectively performing feature selection.
#### 	•	L2 Regularization (Ridge): Shrinks coefficients but does not set any to zero.
#### 	•	Lasso Loss Function:  \text{Loss} = \sum_{i=1}^{n} (y_i - \hat{y}i)^2 + \lambda \sum{j=1}^{p} |\beta_j| 
####    •	Ridge Loss Function:  \text{Loss} = \sum_{i=1}^{n} (y_i - \hat{y}i)^2 + \lambda \sum{j=1}^{p} \beta_j^2 

#### 37. What are outliers, and how do you handle them in regression?
#### 	•	Answer:
#### 	•	Outliers are extreme values that deviate significantly from the rest of the data.
#### 	•	Handling methods:
#### 	•	Remove if they are errors.
#### 	•	Use robust regression techniques.
#### 	•	Apply transformations (e.g., log, square root).

#### 38. How would you explain a low R-Squared value in your model?
#### 	•	Answer:
#### 	•	A low R-Squared may indicate:
#### 	•	Poor predictors.
#### 	•	High variability in data not captured by predictors.
#### 	•	Steps to address:
#### 	•	Add more relevant features.
#### 	•	Use polynomial or interaction terms.
#### 	•	Check for nonlinear relationships.

#### 39. Explain the steps to build and evaluate a regression model.
#### 	•	Answer:
#### 	1.	Data Exploration: Understand the data, check distributions, and identify outliers/missing values.
#### 	2.	Feature Engineering: Create or transform features.
#### 	3.	Model Building: Fit a regression model (linear, logistic, etc.).
#### 	4.	Model Evaluation: Use metrics like RMSE, R-Squared, or MAE.
#### 	5.	Interpretation: Analyze coefficients and residuals.

#### 40. What is overfitting and how to handle it?
#### Answer:
#### Overfitting occurs when a model learns the noise and details of the training data too well, leading to poor generalization on unseen data. The model is too complex relative to the dataset.
#### Ways to handle the overfitting
#### 1.	Reduce Model Complexity:
#### 	•	Simplify the model by reducing the number of predictors (feature selection).
#### 	•	Remove polynomial or interaction terms if they are unnecessary.
#### 	2.	Regularization:
#### 	•	Use techniques like L1 (Lasso) or L2 (Ridge) regularization to penalize large coefficients and prevent overfitting.
#### 	3.	Cross-Validation:
#### 	•	Use cross-validation (e.g., k-fold) to evaluate model performance and detect overfitting during training.
#### 	4.	Use More Training Data:
#### 	•	Increase the size of the training set, if possible, to provide the model with more diverse examples.
#### 	5.	Drop Irrelevant Features:
#### 	•	Use feature selection methods like Recursive Feature Elimination (RFE) or Lasso to remove features that don’t contribute meaningfully.
	

#### 41. What is Underfitting and how to handle it?
#### Answer:
#### Underfitting happens when the model is too simple to capture the underlying structure of the data. It performs poorly on both the training and test datasets.

#### Ways to handle the underfitting
#### 	1.	Increase Model Complexity:
#### 	•	Add more features or predictors that are relevant to the target variable.
#### 	•	Include polynomial or interaction terms to capture nonlinear relationships.
#### 	2.	Reduce Regularization:
#### 	•	If using regularization, decrease the penalty to allow the model more flexibility.
#### Example:
             from sklearn.linear_model import Lasso
             model = Lasso(alpha=0.01)  # Lower alpha reduces regularization
             model.fit(X_train, y_train)

#### 	3.	Use a More Sophisticated Model:
#### 	•	Replace a simple model (e.g., linear regression) with a more complex one (e.g., decision trees, random forests, or gradient boosting).
#### 	4.	Tune Hyperparameters:
#### 	•	Optimize parameters of the regression algorithm to improve performance.
#### •	Example (using GridSearchCV):
        from sklearn.model_selection import GridSearchCV
        from sklearn.ensemble import RandomForestRegressor

        param_grid = {'n_estimators': [50, 100, 150], 'max_depth': [None, 10, 20]}
        grid_search = GridSearchCV(RandomForestRegressor(), param_grid, cv=5)
        grid_search.fit(X_train, y_train)

### AWS RedShift Questions

#### 41. What is Amazon Redshift?
#### 	•	Answer:
#### Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. It allows organizations to run complex analytical queries against structured and semi-structured data stored across data warehouses, operational databases, and data lakes.

#### 42. How does Amazon Redshift store data?
#### 	•	Answer:
#### 	•	Redshift uses columnar storage, where data is stored by columns instead of rows. This enables faster analytical queries since only the relevant columns are read for a query.
#### 	•	Data is distributed across nodes using distribution styles: EVEN, KEY, and ALL.

#### 43. What are distribution styles in Redshift, and when do you use them?
#### 	•	Answer:
#### 	•	EVEN: Distributes rows evenly across all nodes. Best for tables with no clear join keys.
#### 	•	KEY: Distributes rows based on the value of a distribution key. Best for frequently joined tables to minimize data shuffling.
#### 	•	ALL: Replicates the entire table to all nodes. Best for small dimension tables frequently joined with other tables.

#### 44. How does Redshift handle backup and recovery?
#### 	•	Answer:
#### 	•	Redshift performs automatic snapshots of the cluster and saves them to Amazon S3.
#### 	•	Manual Snapshots: Users can create manual snapshots that persist even if the cluster is deleted.
#### 	•	Cross-Region Snapshots: Snapshots can be copied to a different region for disaster recovery.

### Questions on Exploratory Data Analysis (EDA)

#### 45. What is Exploratory Data Analysis (EDA), and why is it important?
#### Answer:
#### 	•	EDA is the process of analyzing datasets to summarize their main characteristics, often with visualizations, to uncover patterns, anomalies, and relationships in the data.
#### 	•	Importance:
#### 	•	Understand data distribution, quality, and structure.
#### 	•	Identify outliers and missing values.
#### 	•	Select appropriate models and transformations.

#### 46. What are the key steps in performing EDA?
#### Answer:
#### 	•	Understand the data context and objectives.
#### 	•	Load and inspect the data (e.g., data types, dimensions).
#### 	•	Check for missing values and duplicates.
#### 	•	Analyze statistical summaries (mean, median, variance).
#### 	•	Visualize distributions, correlations, and trends (e.g., histograms, scatter plots).
#### 	•	Handle outliers and missing values.

#### 47. Which Python libraries are commonly used for EDA?
#### Answer:
#### 	•	Pandas: Data manipulation and analysis.
#### 	•	NumPy: Numerical operations.
#### 	•	Matplotlib/Seaborn: Data visualization.
#### 	•	SciPy: Statistical analysis.
#### 	•	Plotly: Interactive visualizations.

#### 48. How do you handle missing data during EDA?
#### Answer:
#### 	•	Imputation: Fill missing values using techniques like mean, median, or mode.
#### 	•	Deletion: Remove rows/columns with a high percentage of missing values.
#### 	•	Model-Based Imputation: Predict missing values using machine learning models.
#### 	•	Indicator Variable: Create a separate binary column to indicate missingness.

#### 49. How do you detect and handle outliers in data?
#### Answer:
#### 	•	Detection Methods:
#### 	•	Box plots, scatter plots, z-scores, or the IQR method.
#### 	•	Handling Outliers:
#### 	•	Remove: If the outliers are errors or irrelevant.
#### 	•	Transform: Use log or square root transformations to reduce their impact.
#### 	•	Cap: Winsorize the data to limit extreme values.

#### 50. How do you detect and handle outliers in data?
#### Answer:
#### 	•	Detection Methods:
#### 	•	Box plots, scatter plots, z-scores, or the IQR method.
#### 	•	Handling Outliers:
#### 	•	Remove: If the outliers are errors or irrelevant.
#### 	•	Transform: Use log or square root transformations to reduce their impact.
#### 	•	Cap: Winsorize the data to limit extreme values.

#### 51. What is the difference between univariate, bivariate, and multivariate analysis?
#### Answer:
#### 	•	Univariate Analysis: Examines one variable at a time (e.g., histograms, box plots).
#### 	•	Bivariate Analysis: Examines the relationship between two variables (e.g., scatter plots, correlation matrices).
#### 	•	Multivariate Analysis: Explores relationships among more than two variables (e.g., pair plots, PCA).

#### 52. How do you check for multicollinearity in a dataset?
#### Answer:
#### 	•	Variance Inflation Factor (VIF): A high VIF (>10) indicates multicollinearity.
#### 	•	Correlation Matrix: Check for high correlations between predictors.

#### 53. How do you handle high-dimensional datasets during EDA?
#### Answer:
#### 	•	Use dimensionality reduction techniques like PCA or t-SNE.
#### 	•	Perform feature selection using correlation analysis, feature importance, or recursive feature elimination.

### Questions on Data Governance

#### 54. What is data governance, and why is it important?
#### Answer:
#### 	•	Data governance is the practice of managing data’s availability, usability, integrity, and security within an organization.
#### 	•	Importance:
#### 	•	Ensures data quality and compliance with regulations.
#### 	•	Reduces data silos.
#### 	•	Enhances decision-making with reliable data.

#### 55. What are the main challenges in implementing data governance?
#### Answer:
#### 	•	Lack of executive support and cultural resistance.
#### 	•	Poor data quality and siloed systems.
#### 	•	Difficulty in aligning policies across departments.
#### 	•	Complex regulatory environments.


#### 56. What is data lineage, and why is it important?
#### Answer:
#### 	•	Data lineage traces the flow of data from its origin to its final destination.
#### 	•	Importance:
#### 	•	Provides visibility into data transformations.
#### 	•	Helps in troubleshooting and impact analysis.
#### 	•	Ensures compliance with regulations.

#### 57. What is the role of metadata management in data governance?
#### Answer:
#### 	•	Metadata management involves creating and maintaining a repository of information about data (e.g., its source, structure, and transformations).
#### 	•	It improves data discoverability, quality, and compliance.

#### 58. How do you measure the success of a data governance program?
#### Answer:
#### 	•	Metrics for success:
#### 	•	Improvement in data quality scores.
#### 	•	Reduction in data breaches.
#### 	•	Increased compliance with regulations.
#### 	•	Enhanced user satisfaction with data availability.

#### 59. What are the key components of a data governance framework?
#### Answer:
#### 	•	Data Ownership: Clear ownership of data assets.
#### 	•	Policies and Standards: Guidelines for data usage and handling.
#### 	•	Data Quality Management: Ensures accurate, complete, and consistent data.
#### 	•	Data Security and Privacy: Protect data from breaches and ensure compliance.
#### 	•	Metadata Management: Centralizes information about data (e.g., source, structure, purpose).

#### 60. How do you balance data accessibility and security in governance?
#### Answer:
#### 	•	Use role-based access control (RBAC) to limit data access based on roles.
#### 	•	Encrypt sensitive data and anonymize it when sharing.
#### 	•	Monitor access logs for unauthorized activity.