In [None]:
# Notebook for Simon's session 09:30 - 12:30. #

In [1]:
import numpy as np
import pandas as pd 
import seaborn as sns 
import matplotlib.pyplot as plt
from sklearn import preprocessing 


In [None]:
# Tip from Simon: look into iter_tools, func_tools, and collections. (these are libraries for you to explore when you start moving from beginner to intermediate programming)

# Today's Session: Ethics in Data Science with Simon #

Key components of ethics include:

Moral Principles: These are fundamental truths or rules that guide ethical behavior. Common examples include honesty, integrity, fairness, and respect for others.

Values: Values are deeply held beliefs that individuals or societies consider important. They influence ethical decision-making and behavior. Cultural, religious, and personal values play a significant role in shaping ethical frameworks.

Moral Dilemmas: Ethics often involves navigating situations where different moral principles or values come into conflict, leading to moral dilemmas. Resolving such dilemmas requires careful consideration and prioritization of ethical principles.

Norms and Standards: Societies often establish norms and standards that reflect their ethical expectations. These can be formalized through laws, codes of conduct, and professional ethics.

Consequences and Intentions: Ethical analysis often considers both the consequences of an action and the intentions behind it. This perspective is known as consequentialism and deontology, respectively.

Cultural and Relativistic Perspectives: Ethical standards can vary across cultures and societies. Ethical relativism acknowledges that what is considered morally acceptable may differ between cultures, and ethical judgments should be made within the context of a particular culture.

Virtue Ethics: This approach focuses on developing virtuous character traits, such as courage, wisdom, and compassion, as the foundation for ethical behavior. It emphasizes the importance of cultivating good character rather than adhering strictly to rules.

Applied Ethics: This branch of ethics addresses specific issues within various fields, such as business ethics, medical ethics, environmental ethics, and bioethics. It involves the application of ethical principles to real-world situations.

Praxis: actions you take according to your morals and ethics, to enact ethics in practice. 

# Life Cycle of Data #

1: Generation: Data is created through various processes, such as manual entry, automated systems, sensors, or other data collection methods. Assumes that objective is set. 

2: Collection, Sampling: Data is gathered from different sources, including surveys, transactions, sensors, social media, etc.

3: Organization: Data is inputted, structured and organized for efficient storage and retrieval.

4: Storage: Data is stored in databases, data warehouses, or other storage systems. This stage involves decisions about where and how to store data based on factors like accessibility, security, and performance requirements.

5: Processing/Analysis:

Transformation: Data may undergo preprocessing or transformation to ensure quality and consistency.
Analysis: Data is analyzed for insights, patterns, trends, or to derive meaningful information. This stage often involves the use of analytics tools, machine learning algorithms, and statistical methods.

6: Usage:

Decision-making: Insights from data are used to make informed decisions in various domains, such as business, science, healthcare, etc.
Reporting: Data is presented in a meaningful format through reports, dashboards, or visualizations to facilitate understanding.
Distribution: Relevant data may be shared within an organization or with external entities.
Communication: Data-driven insights are communicated to stakeholders through various channels.
Implications: explore the insights of the data in terms of what it tells us. 

7: Archiving:

Retention: Some data is retained for historical or compliance purposes, even if it is no longer actively used.
Archival: Data is moved to archival storage to free up resources in the primary storage systems.

8: Deletion/Disposal:
End of Life: Data that is no longer needed or has reached the end of its useful life is securely deleted or disposed of.
Data Privacy Compliance: Considerations for data privacy regulations may influence how and when data is deleted.

# Ethics of Data Collection # 


When collecting and using data, there are several ethical considerations that individuals and organizations should take into account to ensure responsible and fair practices. Some key ethical issues include:

Privacy:

Informed Consent: Ensure individuals are aware of how their data will be collected, used, and shared, and obtain their explicit consent.
Anonymity and Confidentiality: Protect the identity of individuals by anonymizing data whenever possible and maintaining confidentiality.
Transparency:

Openness: Be transparent about data collection methods, purposes, and intended uses. Provide clear explanations to users about how their data will be handled.
Data Accuracy and Quality:

Accuracy: Strive to collect accurate and reliable data, and take steps to validate and verify the quality of the data.
Integrity: Avoid intentional manipulation or distortion of data that could lead to biased results or misinterpretation.
Fairness and Equity:

Bias Mitigation: Be aware of and address potential biases in data collection, analysis, and interpretation to ensure fair and unbiased outcomes.
Equitable Access: Ensure that data-driven benefits are distributed fairly and that the collection process does not disproportionately affect certain groups.
Security:

Data Security: Implement measures to protect data from unauthorized access, breaches, and cyber threats.
Data Retention: Establish policies for the appropriate retention and disposal of data to minimize security risks.
Ownership and Control:

Data Ownership: Clarify who owns the data and under what conditions it can be shared or used by third parties.
User Control: Empower individuals to control their data by providing options for opting in or out of data collection and sharing.

Consent and Voluntariness:
Coercion: Ensure that individuals are not coerced into providing their data and that participation is voluntary.
Withdrawal: Allow individuals the right to withdraw their consent and have their data removed if desired.

Accountability:

Responsibility: Clearly define roles and responsibilities for data handling and ensure accountability for ethical lapses.
Compliance: Adhere to relevant laws, regulations, and industry standards governing data collection and usage.

Community and Social Impact:

Community Involvement: Engage with communities affected by data collection and use to understand their concerns and needs.
Social Responsibility: Consider the broader societal implications of data use and strive to contribute positively to societal well-being.

Emerging Technologies:

Ethical AI: Ensure that the use of artificial intelligence and machine learning is guided by ethical principles, addressing issues such as bias, fairness, and accountability.

Data Minimization: only collecting personal data that you need, nothing extra. 


Ethical issues in data analysis can arise at various stages of the data analysis process. Here are some key ethical considerations related to data analysis:

Bias and Fairness:
Algorithmic Bias: Data analysis methods, particularly machine learning algorithms, can perpetuate or amplify existing biases present in the data. This can lead to unfair or discriminatory outcomes.

Privacy Concerns:
Re-identification Risk: Aggregated and anonymized data can sometimes be re-identified, posing a risk to individuals' privacy. Analysts must consider the potential for unintended disclosure.

Data Manipulation and Integrity:
Data Falsification: Intentional manipulation of data to achieve specific outcomes or to mislead stakeholders is unethical. Analysts should maintain the integrity of the data throughout the analysis process.

Transparency:
Opaque Models: Lack of transparency in complex models can make it difficult to understand how decisions are made. Transparency is crucial for accountability and user trust.

Informed Consent:
Lack of Informed Consent: If the data used in the analysis were originally collected without informed consent, or if the analysis involves unexpected uses of the data, ethical concerns may arise.

Ownership and Intellectual Property:
Data Ownership: Clarifying who owns the data and respecting intellectual property rights is important to ensure ethical data analysis practices.
Stakeholder Communication:

Miscommunication of Results: Presenting analysis results in a way that misrepresents findings or exaggerates their significance can be misleading and unethical.

Security:
Inadequate Data Security: Failure to secure data during the analysis process can lead to data breaches, putting individuals at risk and violating ethical standards.

Impact on Vulnerable Populations:
Disproportionate Effects: Analyzing data that disproportionately affects vulnerable or marginalized populations requires extra care to avoid exacerbating existing inequalities.

Long-Term Consequences:
Unintended Consequences: Analysts should consider the potential long-term consequences of their analyses, including social, economic, and environmental impacts.

Responsible Use of Predictive Analytics:

Predictive Analytics Risks: Ethical concerns can arise when using predictive analytics to make decisions about individuals' behavior, particularly when those predictions impact people's lives (e.g., credit scoring, hiring decisions).

Data Stewardship:
Responsible Data Handling: Ethical data analysts adhere to responsible data stewardship practices, respecting data usage policies and guidelines.

Also consider things like content validity, ecological validity, construct validity. 