Permalink
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
118 lines (82 sloc) 10.8 KB

Important notice: the following datasets contain personal data taken from the AFSP website. The data have been compiled for research purposes only. According to French law, you can ask the maintainer of the data for access, rectification or even suppression of the data (although the latter requires 'legitimate' reasons to do so). To exercise any of those rights, please email me.

All files documented below are UTF-8 encoded. All values are unquoted, and missing values are coded as NA. None of the files should contain any duplicated rows.

edges.csv

A CSV file with one row per conference participant and per conference panel attended:

  • year – Year of AFSP conference (2009, 2011, 2013, 2015, 2017).
  • i – Full name of the participant, coded as FAMILY NAME FIRST NAME, all uppercase, without accents or lone initials, and sometimes harmonised to allow for matching across conference years.
  • j – Panel attended, coded as YEAR_ID, where ID contains the type of panel (e.g. CP for plenary conferences, ST for thematic sessions), followed by the alphanumeric identifier of the panel when there was one.
  • n_j – Number of participants to the conference panel.
  • n_p – Number of conference panels attended that year by the participant.
  • t_p – Total number of panels attended by the participant.
  • t_c – Total number of conferences attended by the participant.
  • first_name – First name(s) of the participant, extracted from i.
  • family_name – Family name of the participant, extracted from i.
  • gender – Gender of the participant, coded as f for female and m for male.

Note – The gender variable is based on sex ratios computed from the Fichier Prénoms Insee, 2016 edition, which will be downloaded to the data folder during data preparation, as well as on manual additions provided in genders.tsv.

fixes.tsv

A TSV (tab-separated) file with one row per conference participant and per conference panel attended for which a fix was identified by manually checking the original data, and therefore to be used to fix errors in edges.csv:

  • type – type of fix to perform:
    • add: append this row (i.e. the participant is entirely missing from the edges.csv, i.e. from participant indexes)
    • err: remove the row ("error" found either in the panel identifier, and/or in the identity of the participant)
    • abs: remove or ignore prior to analysis, depending on analytical strategy; indicates that the participant has been manually confirmed as absent (i.e. unlisted) from the panel data, which should match value a in variable role in participants.tsv
  • i – Full name of the participant, either coded exactly as i in edges.csv or corrected for misspellings and other errors.
  • j – Panel attended, coded exactly as j in edges.csv.

genders.tsv

A TSV (tab-separated) file with one row per conference participant present in edges.csv for which gender could not be determined from the Fichier Prénoms Insee, 2016 edition (see note above):

  • gender – Gender of the participant:
    • Coded as f for female and m for male.
    • All missing values so far have been manually inputed.
  • name – Full name of the participant, coded exactly as i in edges.csv.

This file can be manually revised to improve the completeness of the gender variable in edges.csv. The file will be loaded during data preparation, updated with any new participant names for which gender could not be determined, and then re-saved.

incidence_matrix.rds

A serialized R object of class matrix representing the i (conference participant) × j (conference panel) incidence matrix contained in edges.csv, with each edge inversely weighted by 1 / nj, where nj denotes the total number of participants to panel j. Because all conference panels have at least two participants, the edge weights have a maximum value of 0.5.

names.tsv

A TSV (tab-separated) file with one row per participant present in edges.csv for which the name needed to be manually modified for any reason (usually typos or inconsistencies across conference years):

  • year – Year of AFSP conference.
  • i – Full name of the participant, as found in the original data.
  • i_fixed – Corrected full name of the participant, coded exactly as i in edges.csv.

Note – The corrections apply the simplifications listed in the documentation for edges.csv, as well as some additional simplifications to foreign names: for instance, Korean first names (e.g. KIL-HO or SUNG-EUN) are simplified by removing the dash, as seems to have been commonly done in the original data.

panels.tsv

A TSV (tab-separated) file with one row per conference panel:

  • year – Year of AFSP conference.
  • id – Panel identifier that matches the ID part of the j variable in edges.csv.
  • title – Panel title, slightly cleaned up:
    • Multiples spaces were replaced by a single one.
    • Double quotes are coded as « French quotes ».
    • Single quotes are coded as .
    • Unbreakable spaces are used before :;?! and before/after double quotes.
    • Full stops at the end of titles were removed.
    • All instances of État (the State) are accentuated.
  • notes – Notes, in French, when available (e.g. to indicate the panel was postponed).

The data were manually extracted from the relevant AFSP Web pages. A handful of panels listed in the file have no participants listed in edges.csv, for various reasons (e.g. the panel was cancelled or postponed, the panel is a PhD workshop with no participants list).

This file contains slightly better formatted panel titles than those collected during data preparation, and should therefore be preferred when requesting that information. The information contained in the notes column are exclusive to that file.

participants.tsv

A TSV (tab-separated) file with one row per conference participants and per conference panel attended (see edges.csv below):

  • role – Role of the participant within the panel:
    • Programmatically identified roles: o (organiser), p (presenter); those roles are the only ones that can be trusted to be somewhat reliably coded for most panels. In particular, all panels with less than two organisers have been manually checked and corrected when needed.
    • Manually identified roles: c and d (chair or discussant who is not also a presenter), a (absentee, i.e. participant listed in the conference index but not listed anywhere in the panel page).
    • The role is coded as e (for "else") if the participant is listed at the end of the panel page but does not appear anywhere else on the page.
  • i – Full name of the participant, coded exactly as i in edges.csv.
  • j – Panel attended, coded exactly as j in edges.csv.
  • affiliation – Academic affiliation, standardized to a reasonable level:
    • When available, the affiliation starts with the acronym or name of the research unit, which might be a department, an institute, a research laboratory or team, etc. Merged units contain both names separated by dashes, e.g. GSPE-PRISME-SAGE.
    • The affiliation then usually contains the name of the university or other institution that hosts the research unit. All linguistic variants of the word "university" are replaced with U., and Parisian universities are denoted by their post-1968 number in Arabic form (e.g. "U. PARIS 11"). Some units are co-hosted by several institutions separated with ampersands and/or dashes, e.g. IEP-U. STRASBOURG or ENS-PARIS & EHESS PARIS.
    • When the institution is located in France, an effort is made to include the city in its name, e.g. INSEEC BORDEAUX. When the institution is located outside of France, the country is then indicated in brackets, in French, at the exception of USA. This also applies to some French institutions located abroad, and does not apply to the American University in Paris.
    • Non-academic affiliations, which can be either institutions (e.g. "UNESCO"), or occupations (e.g. "consultant"), are surrounded by [hard brackets] and might include a geographic indication (see previous point).
    • All other information (irregularly) reported in the raw data have been removed, including, for instance, CNRS, FNRS and IUF affiliations, memberships to informal research groups or to funded projects (e.g. ANR, ERC), and doctor or professor titles.
    • Multiple affiliations are listed by their original order of appearance and are separated with slashes, as in x, y (z) / a, b (in this example, the first affiliation is from an institution located outside of France).

Although an effort has been made to harmonize affiliations, many of the affiliations listed in this file are either lowly accurate, incomplete, or missing entirely.

This file can be manually revised to improve the accuracy of the affiliation variable in edges.csv. Its role variable will get copied to edges.csv during data preparation.