Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few comments #1

Open
knadh opened this issue Dec 12, 2023 · 0 comments
Open

A few comments #1

knadh opened this issue Dec 12, 2023 · 0 comments

Comments

@knadh
Copy link

knadh commented Dec 12, 2023

Thank you for opening up the document for comments!

Sharing a few quick thoughts:

  1. There is good focus on marketplace, industry, state capacity, but community and civil society as stakeholders seem to be missing. The impact of these technologies on disaparate communities is going to be significant. It's also important to note outside the industry, the lion's share of incremental breakthroughs and innovation is happening in tinkering communities.

  2. In the same vein, imagine a Wikipedia-style, Wikipedia-scale non-profit AI community service. How would that fit into the overal thesis here?

Ownership of Data

A similar framework could also be applied to unlock the value of various other data locked in silos for specific purposes approved by the user. Under such a framework, users will also have the option to opt out of their data being used for training purposes.

Data portability to unlock value for users across markets and also prevent undesirable platform lock-in effects. For example, a seller on one e-commerce platform should be able to transfer their authentic product reviews to another e-commerce platform. Likewise, a user's health data from a fitness tracker can be shared with their chosen healthcare provider to facilitate the creation of a personalised health plan. Data request templates for different use cases need to be established for various use cases in consultation with stakeholders across industry and academia.

Enable data provenance and usage transparency. A data aggregation framework also offers information about the origin and lineage of a dataset available within the market. In addition, it can also serve as an audit trail for the downstream models where a dataset is being used.

  1. I think this entire section may be introducing significant scope-creep to the context of the document (where the scope is training data that goes into AI systems). Concepts like universal data portability, provenance etc. fall outside this realm and are entire complex fields in themselves (conceptually and technically, and also philosophically). This is applicable to the use of user data in any context, be it training AI systems, mathematical models, or advertisement targeting systems.

Individuals should be able to dictate if, how, and by whom their data can be used. While it is common practice for firms to aggregate and transform the data they capture for internal and external use, individuals typically lack the means to do the same. India's Data Empowerment and Protection Architecture (DEPA) and the Account Aggregator Framework built on top of it illustrate a consent-based intermediary system14 to facilitate such a transaction.

  1. AA provides a structured, smooth system for users to share data with requesters. However, firms aggregating and transforming data from various sources and signals, that is mutually exclusive of organisations using a system like AA to request data. Those practices can continue regardless.

We propose the creation of a sector agnostic entity staffed with data scientists and cybersecurity experts to ensure data uniformity and compliance with best practices. Its mandate would include creating a data engineering plan, conducting audits of different state entities, and enforcing data compliance standards.

  1. In addition to adopting and following universal data standards within the govt., the enforcement of a "public money, public code" / "public money, public data" approach seems pertinent here. To cite an example, there are various govt. bodies that use tax payer money to create language training corpora, dictionary datasets etc. and keep them locked away. Such datasets should be in the public domain.

For instance, CSPs could be compelled to share or disclose such data with foreign governments or agencies, and they might also be susceptible to cyberattacks or sabotage by malicious actors.

  1. Cyberattack and sabotage can happen to data stored anywhere (especially in a networked system) is suspectible. CSPs, foreign or domestic, is irrelevaant.

... as quantum computing, ... and edge computing, which can enhance the performance and efficiency of AI applications

This is not correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant